Statistics were already in use for decades in the analytics space. But, machine learning has emerged as a new domain and overshadowed the luminance of statistics to some extent.
I am enjoying being a data scientist for quite some time now. When I started my career, data science as a field of opportunity did not exist. Statisticians used to rule the world of analytics then.
Nothing lasts eternally. Data science as a domain emerged. It has become the most exciting job of the 21st century. Knowing Statistics is not enough to be a data scientist in the current industry scenario. It needs a mix of skills including statistics, machine learning, programming, and storytelling.
Gradually machine learning has gained a stronghold over the analytics industry. Companies are more excited to employ machine learning experts than statisticians. It leads to a natural question. Why machine learning is becoming popular in the analytics industry? Is there any concrete reason behind this? Or it is merely a hoopla.
1. Data is cheap and accessible
In the early days of statistics, data was scarce. The practice of storing data did not start widely. Electronic storage of data was rarely available. So most statistical methods were built to infer from a handful amount of data.
In statistics, we call a sample of five units as a small sample. A sample size greater than thirty is called a large sample.
Machine Learning is a pretty data-intensive method. Machine learning models need to see lots of examples to understand the pattern in the data.
Technology has made this task easier. Data storage has become cheap and convenient. Machine Learning models now get lots of data to train on. As a result, professionals started preferring it.
2. Deep learning has outperformed many standard methods
The use of deep learning is a game-changer for a broad segment of the analytics industry. Artificial neural networks with multiple layers have shown remarkable success. The idea of backpropagation and error reduction in each iteration fascinates me. In the same model, the facility of using multiple activation functions is another unique property that makes the model smarter to capture the complex patterns of the data.
Statistical regression methods are much simpler. But not that efficient in capturing complex data patterns. Iterative correction is also not available in the standard models.
3. New field of studies emerged where machine learning is immensely successful
Domains like Computer Vision and Natural Language Processing have made use of machine learning and achieved remarkable success. In Computer Vision, machine learning models sometimes outperform human performance.
Statistics lacks methods to deal with such problems. Besides, these fields of applications emerged with the progress of technology. They were beyond imagination a few decades back. For solving this kind of problem, we rarely think of statistics as a viable option.
4. A large pull of data scientists have technology background
Aspirants and professionals from different backgrounds are joining the analytics industry. A lot of technology graduates are opting for data science as a career. Many of them do not have adequate exposure to statistics. But a major part of them, especially those who are from a computer science background, have familiarity with machine learning.
Those who do not have taken machine learning as a course, they are opting for online courses. Being a technology graduate, machine learning feels more appealing to them than statistics.
5. Sometimes correct prediction is more important than causality
In statistics, we do a detailed study of each variable and related causality. The key focus remains on understanding the reason behind every behavior of the variables before model building.
But, in some use cases understanding the behavior of variables and their relationship is not the point of interest. Rather, the final prediction from the model is more important.
Machine learning models provide scope to build predictive models without diving into the depth of the statistical properties of the variables.
6. Sometimes machine learning models yield good results with lesser effort than statistical models
Statistical analysis is very much dependent on the distribution and statistical properties of the dependent and independent variables. An in-depth study of the relationship between the variables is essential. The significance of the variables in statistical models is required to be tested. Only a carefully created model will give you good enough prediction.
But machine learning models are data-hungry. Given enough variables and large data sets, models will provide a good result in many cases. We may not need a detailed study of the individual characteristics of the variables. However, understanding the characteristics of each variable and the relationship between them is always a good practice.
7. Machine Learning models require fewer human intervention
Machine Learning models are less human dependent than statistical models. These models are capable of performing well with minimum recurring effort. But statistical approaches require more intervention. It requires decision making at each and every step.
As a data scientist, I am not biased to a particular approach. I believe as data scientists we should leverage both the approaches as per requirement. All the real-life problems do not require building a machine learning model. We can deal many of them with careful statistical analysis and thoughtful inference. Also, in many cases, we require using machine learning models to understand and predict the pattern in the data. It depends on the practitioner to choose wisely between these two techniques.
7 Reasons Why Machine Learning is Becoming More Popular than Statistics was originally published in ILLUMINATION on Medium, where people are continuing the conversation by highlighting and responding to this story.
Powered by WPeMatico