# An Introduction to Statistics and Data Science and Differences between Them

By Maroje Portada on February 16, 2022

There is a great deal of overlap between the fields of statistics and data science, to the point where many definitions of one discipline could just as easily describe the other. While this is true, there are also many differences between them. Why is statistics important and what is its connection to data science? What is data science? What are their similarities and differences? Let’s try to understand it better at least on a basic level without too much going into subtle details.

There are many long and complicated definitions of statistics. Those are less interesting for anyone not well versed in this field. Here are several simple definitions instead:

- Statistics is a branch of mathematics dealing with the collection, analysis, interpretation, and presentation of masses of numerical data. (https://www.merriam-webster.com/dictionary/statistics)
- Statistics is a collection of quantitative data. (https://www.merriam-webster.com/dictionary/statistics)
- Statistics is a set of mathematical methods and tools that enable us to answer important questions about data. (https://www.freecodecamp.org/news/statistics-for-data-science/)

Within statistics there are two branches: descriptive and inferential statistics.

Descriptive statistics provides methods to organize, summarize and present raw data into something more convenient and informative called information. That information can then be interpreted and shared. In descriptive statistics it is possible to use many different graphical and numerical techniques to describe the data.

Inferential statistics offers us methods to examine small samples of data and make estimations or draw conclusions about bigger sets of data called population. Those estimations and conclusions can be true but it isn’t always so.

## What Can We Use Statistics for?

We can find statistics in our lives much more than we are aware. Weather prediction, election polls, estimation of economic growth, stock price on markets, demographics, sports statistics, behavior of users on social networks, trendy topics, successful sales on social networks and much more.

Wherever there is data, there is potential for use of statistics. There are many complex problems in all parts of our lives that can be solved with statistics. It is important to notice that statistics helps in making more concrete decisions with less risk and uncertainty. While intuition is useful, we should always use as much information as we can to make better decisions.

## What Is Data Science?

After covering the topic of statistics, it is time to say something about data science as well. One simple definition of data science considers it a multidisciplinary field that combines some technical skills with soft skills to extract information from structured and unstructured data.

The principal purpose of data science is to find patterns between the data. It is still expanding and its evolution is heavily dependent on development of technology, especially computer science and programming languages.

## Similarities and Differences between Data Scientists and Statisticians

Fields of work for data scientists and statisticians are quite closely related even to the point of often being considered as synonyms, but that is mostly not true - there are also many differences to distinguish the two.

*What are similarities between data scientists and statisticians?*

Both roles:

- need some degree of understanding of mathematics;
- investigate problems;
- analyse data;
- analyse trends;
- create forecasts;
- use visualisations;
- often report their findings to non-technical users;

What are the differences?

- Data scientists use computer science, algorithms or machine learning more than statisticians.
- Data scientists are more involved in creation and use of data systems, while statisticians focus more on the equations and mathematical models that they use for their analysis.
- Data scientists more often use big data, while statisticians typically use smaller data sets.
- Data scientists compare many methods to create the best machine learning model while statisticians more often improve a single model until it befits their data set.
- Statisticians focus more on quantifying uncertainty and making inferences.

## Final Thoughts

Statistics and data science have lots of things in common. Use of mathematics, investigation of problems and data analysis are just a few of them. There are also differences like the level of information technology used, usual size of data sets and approach to the learning model.

Most certainly data science and statistics will continue to coexist and to some extent influence one another.

The goal of this topic was to bring those areas closer to people who don’t know much about them in the simplest possible way. Would you like to share your experience with statistics and data science? Your thoughts and comments are more than welcome.

## Recent blog posts

### Tech Blog: How to configure JSON logging in nginx?

In previous posts from this series, we discussed how we formatted UWSGI and Python logs in JSON format. We still have one important production component left: the Nginx server. This blog post will describe how the Nginx logging module works, and showcase a simple logging configuration where Nginx logger is configured to output JSON logs.

**Read more**

### Tech Blog: Connecting Elasticsearch logs to Grafana and Kibana

What is the point in collecting logs and metrics if you don’t use them? In this blog post, we will build upon our previous blog post and connect Fluent Bit log collectors to Elasticsearch along with a basic setup and comparison of Kibana and Grafana, tools often used for visualizing logs

**Read more**

### Tech Blog: Collecting logs in docker clusters

In previous blogs from this series we discussed how we formatted uwsgi and Python logs using JSON. Properly formatted logs are, however, useless if you don't have a way of accessing them. Reading raw log files on servers directly is great when working on a small number of servers, but it quickly becomes cumbersome.

**Read more**