Updated: Feb 19, 2021
You've identified a data analytics project and now the question is “how do I start?”. While our advice might sound like a paradox, it's pretty logical: You start with the end.
Disclaimer before we dive in: You can't begin with the end if you don't even know what you're starting. If you need help focusing your data initiatives and choosing what projects to prioritize, sign up for our 2-week data expedition.
Now that we have that out of the way… Why begin with the end?
Good UX: Give the people what they need
The stakeholders in need of data-driven insight, for the most part, do not have a data analytics degree. They most likely have their own terminology and nomenclature they use to communicate. You have to make sure that the results in your model or your data and analytics project will resonate with them. It is important to deliver insights that are not only meaningful or understandable but that are also presented in their terminology. Instead of saying that model has an F1 score of 90%, you would want to say the model is correct 90% of the time. Data science terms like F1 score will not connect, and it will go over their heads. Without user-specific terminology, data initiatives will miss the mark and that would be a shame. That is why it is so important to map the initiatives out in a language that they'll understand so the end-user can interpret their data in a way that is suited to their needs.
Due Diligence: Don’t promise what you can’t deliver
The 2nd reason to start from the end is to vet the production. Even if production might not seem that relevant for a small project, you still have to think “If this is a success, how is this going to run into production?” Building a proof of concept in a Jupyter notebook on your laptop is a completely different ball game compared to a full-blown secure scalable data pipeline. If you know that this project has the potential to be used in production, you really have to start taking the requirements into account. It will allow you to start thinking about the kind of model that could be used, the data needed for the model, and where that data can be found. Once you know all that, you can work your way backward and you're going to want to map all the requirements to that. Doing this will allow you to identify if a promising model might not be possible in production and shift gears to a solution with better odds of success before wasting time and resources on an initiative bound for failure.
In conclusion, usability and risk assessment largely sum-up why it's important to work your way backward. In the beginning, figure out what you would like to achieve and then consider the pipeline: the raw data, data acquisition, data cleaning, building the model, etc. Essentially it is a cycle that keeps the end goal in mind. To illustrate this concept with a metaphor, beginning with the end is the difference between building a race car you are not sure anybody wants and working with someone from Ferrari looking for a Formula One car that can do laps in less than one minute and twenty seconds. The clear requirements and end-user make all the difference.
Data and Analytics Lead
Missing Link Technologies ltd.
Written in collaboration with:
Missing Link Technologies ltd.