What is Data Science?

The term Data Science comes with a lot of different interpretations these days, so it might be easier to start with what Data Science is not. Data Science is not about writing code or awesome visualizations. It’s not about making complicated models either.

Data Science is about using data to create as much impact as possible for your company.

Impact could be driven in multiple ways, it could be insights and visibility into the data, or data products which allow prediction/classification. To build such things, we need tools like statistical models, visualizations or writing code.

Data Science aims to solve real company problems using data. What tools do we use? It depends.

What’s popular vs Industry Needs

Some topics are more fun to talk about than others, so there is a misalignment in what’s popular in media and what is needed in the industry.

The rise in big data sparked the rise in data science to support the needs of businesses to draw insights from their massive unstructured data sets. As the term became more popular, the Journal of Data Science described it as:

“[…] almost everything that has something to do with data: Collecting, analyzing, modelling… yet the most important part is its applications — all sorts of applications.”

With newfound abundance of data, companies are shifting from a knowledge driven approach to a data driven approach. All the theoretical papers written decades ago about Neural Networks and Support Vector Machines are now possible to implement, supported by the availability of both big data and the necessary hardware to build Machine Learning models. As we might have seen, the promising nature of Machine learning and Industry 4.0 concepts has been in the news lately.

Machine Learning and AI have dominated the media and overshadow all other aspects of Data Science like exploratory analysis, experimentation and skills we traditionally call business intelligence. This gives the impression that Data Science is research focused on Machine learning and AI. Technically, it is part of the job to have a cutting edge Machine Learning model, but companies have so many low hanging fruits that in most cases they don’t require any more advanced machine learning models than what has already been researched.

This tell us that at the heart of Data Science is finding the right problems and solving them through data driven insights. The process involves financial stakeholders and is guided by business domain specialists who are in turn offered visibility into the data.

Data Science is more about:

  • Driving Impact
  • Solving Problems
  • Building Strategies

And less about:

  • Advanced Models
  • Data Crunching
In the case of Merino, the architecture enabling data availability is vast and of high quality

I hope this article gives you a good idea of what capabilities we have and the future directions we could pursue. Feel free to send me any feedback and be part of the process 🙂

Thanks for reading!

Sources:

  • AI Hierarchy of Needs by Monica Rogati
  • Data Science told by a Data Scientist by Joma Tech

Aman Prasad is a music, deep space and data science enthusiast.