An Introduction to Statistics and Data Science and Differences between Them
By Maroje Portada on February 16, 2022
There is a great deal of overlap between the fields of statistics and data science, to the point where many definitions of one discipline could just as easily describe the other. While this is true, there are also many differences between them. Why is statistics important and what is its connection to data science? What is data science? What are their similarities and differences? Let’s try to understand it better at least on a basic level without too much going into subtle details.
There are many long and complicated definitions of statistics. Those are less interesting for anyone not well versed in this field. Here are several simple definitions instead:
Within statistics there are two branches: descriptive and inferential statistics.
Descriptive statistics provides methods to organize, summarize and present raw data into something more convenient and informative called information. That information can then be interpreted and shared. In descriptive statistics it is possible to use many different graphical and numerical techniques to describe the data.
Inferential statistics offers us methods to examine small samples of data and make estimations or draw conclusions about bigger sets of data called population. Those estimations and conclusions can be true but it isn’t always so.
What Can We Use Statistics for?
We can find statistics in our lives much more than we are aware. Weather prediction, election polls, estimation of economic growth, stock price on markets, demographics, sports statistics, behavior of users on social networks, trendy topics, successful sales on social networks and much more.
Wherever there is data, there is potential for use of statistics. There are many complex problems in all parts of our lives that can be solved with statistics. It is important to notice that statistics helps in making more concrete decisions with less risk and uncertainty. While intuition is useful, we should always use as much information as we can to make better decisions.
What Is Data Science?
After covering the topic of statistics, it is time to say something about data science as well. One simple definition of data science considers it a multidisciplinary field that combines some technical skills with soft skills to extract information from structured and unstructured data.
The principal purpose of data science is to find patterns between the data. It is still expanding and its evolution is heavily dependent on development of technology, especially computer science and programming languages.
Similarities and Differences between Data Scientists and Statisticians
Fields of work for data scientists and statisticians are quite closely related even to the point of often being considered as synonyms, but that is mostly not true - there are also many differences to distinguish the two.
What are similarities between data scientists and statisticians?
need some degree of understanding of mathematics;
often report their findings to non-technical users;
What are the differences?
Data scientists use computer science, algorithms or machine learning more than statisticians.
Data scientists are more involved in creation and use of data systems, while statisticians focus more on the equations and mathematical models that they use for their analysis.
Data scientists more often use big data, while statisticians typically use smaller data sets.
Data scientists compare many methods to create the best machine learning model while statisticians more often improve a single model until it befits their data set.
Statisticians focus more on quantifying uncertainty and making inferences.
Statistics and data science have lots of things in common. Use of mathematics, investigation of problems and data analysis are just a few of them. There are also differences like the level of information technology used, usual size of data sets and approach to the learning model.
Most certainly data science and statistics will continue to coexist and to some extent influence one another.
The goal of this topic was to bring those areas closer to people who don’t know much about them in the simplest possible way. Would you like to share your experience with statistics and data science? Your thoughts and comments are more than welcome.
In previous posts from this series, we discussed how we formatted UWSGI and Python logs in JSON format. We still have one important production component left: the Nginx server. This blog post will describe how the Nginx logging module works, and showcase a simple logging configuration where Nginx logger is configured to output JSON logs.
What is the point in collecting logs and metrics if you don’t use them? In this blog post, we will build upon our previous blog post and connect Fluent Bit log collectors to Elasticsearch along with a basic setup and comparison of Kibana and Grafana, tools often used for visualizing logs
In previous blogs from this series we discussed how we formatted uwsgi and Python logs using JSON. Properly formatted logs are, however, useless if you don't have a way of accessing them. Reading raw log files on servers directly is great when working on a small number of servers, but it quickly becomes cumbersome.
We build AI for your needs.
Meet our highly experienced team who just loves to build AI and design its surrounding to incorporate it in your business. Find out for your self how much you can benefit from our fair and open approach.