What does a Data Scientist ACTUALLY do?
Date
Oct 20, 2020
Category
Career & Roles
Since I’ve started working with Data I’ve constantly heard about Data Science and so I always wanted to learn more about it. The problem is that when you start exploring what Data Scientists actually do at work, it can get quite confusing.
The Data Scientist Role in a Nutshell
We all understood that Data Science is about using data to create as much impact as possible for a company. And to create this impact a Data Scientist needs tools like building complicated models, designing neat charts or writing code.
Basically, a Data Scientist solves real company problems, using data from the past to predict the future.
Two Sources of Misconception
There is a lot of misconception about Data Science, and probably two reasons why this happens are:
1. The Media
There’s a huge misalignment between what’s popular to talk about and what’s needed in the industry. Machine learning and AI dominate the media overshadowing every other aspect of Data Science. And so now the general public thinks of Data Scientists as researchers focused on machine learning and AI when in reality in some cases the industry is hiring data scientists who will never actually perform any AI or machine learning-related tasks (maybe because the company is still not ready for it).
2. HR
Another cause of confusion comes from Human Resources managers who understandably can become overwhelmed with the barrage of new terms and buzzwords flying around. This causes them to label job positions inaccurately. One HR representative may call a job position Data Analytics Specialist when in fact they need a Data Analyst. Another may employee a Junior Data Scientist when they require a Business Intelligence Analyst. Of course, there are many companies that word their job offers brilliantly but this is not standard across the industry.
To make things a bit more clear we first have to understand what’s needed in the data world, and we are going to do that using the Data Pyramid of Needs by Monica Rogati (full article here), inspired by the famous Maslow’s Hierarchy of Needs. After having clarified this framework we can then understand where Data Scientists fall along this spectrum.
The Data Pyramid of Needs

Image by Monica Rogati on Hackernoon
Data Generating&Collecting is at the bottom of the pyramid, the basic need. We obviously have to collect some sort of data to be able to use that data. Key here is understanding what data is needed but also what data is available.
Then Moving&Storing the data into the system is pretty important and it’s actually captured pretty well in media because of Big Data and all the challenges related to it.
When data is accessible detecting and correcting (or removing) corrupt or inaccurate records becomes crucial. Exploring&Transforming data, usually called Data Cleaning, is **an under-rated side of Data Science.
Only when data is cleaned and reliable BI&Analytics come into play, which in a nutshell means formulating insights from the data and building metrics to measure success.
Then the data product needs Learnign&Optimizing, through A/B Testing and Experimentation that allows assessing which product versions are the best.
What’s mainly covered in media is the last part of the pyramid, namely AI&Deep Learning. We’ve heard about it on and on. But it’s actually not the highest priority or at least it’s not the thing that yields the most result for the lowest amount of effort. And that’s why AI and Deep Learning is on top of the hierarchy of needs.
Size Matters…
Now that we understood what is needed in the Data world, what do Data Scientists actually do? The size of the company is a key factor here.
Let me explain why using the following three cases:
Start-up
Within Start-ups there is usually a lack of resources. They can probably have only one Data Scientist. So that one Data Scientist has to do everything. Maybe he won’t be doing AI or deep learning because that’s not a priority for the company.
A Data Scientist working in a Start-up might have to set up the whole data infrastructure, write some software code to add logging, then do the analytics and build the metrics, and examining different solutions with A/B Testing.
Medium-Sized Company
But let’s look at Medium-Sized Companies. Finally, they have a lot more resources. They can at least separate the Data Engineers and the Data Scientists, splitting the pyramid's tasks into two.
So usually they will have Data Engineers doing Data Collection and taking care of all the ETL processes. Then, in case the medium-sized company does a lot of recommendation models or stuff that requires AI, then the Data Scientist will take care of the rest of the pyramid.
Large Company
Large companies will probably have a lot more money and therefore they can spend it more on employees. Hence why they can have a lot of different employees working on different things. That way the employees could focus on the things that they’re best at.
So, for example, data collection via instrumental sensors is all handled by Software Engineers. Cleaning and building data pipelines are for Data Engineers. Data Scientists will therefore focus only on Analytics. Then AI and Deep Learning will be the remit of Research Scientists, which could be backed by Machine Learning Engineers.

Conclusion
In summary, as you can see, there is no one single job description for a Data Scientist but, depending on the size of the company, the definition will vary.
I hope that this article will provide you with a bit more clarity on the actual role of a Data Scientist and in case you are considering it as the next step in your career, you might want to have a look at the size of the company you want to work for, since that will drastically change your responsibilities and overall tasks.
Please do reach out to me with all the suggestions and comments you may have.
Thank you for reading!


