What is Data Science?
Data Science is an interdisciplinary field whose primary objective is the extraction of meaningful knowledge and insights from data. These insights are extracted with the help of various mathematical and Machine Learning-based algorithms. Hence, Machine Learning is a key element of Data Science.
Alongside Machine Learning, as the name suggests, “data” itself is the fuel for Data Science. Without the availability of appropriate data, key insights cannot be extracted from it. Both the volume and accuracy of data matters in this field, since the algorithms are designed to “learn” with “experience”, which comes through the data provided. Data Science involves the use of various types of data, from multiple sources. Some of the types of data are image data, text data, video data, time-dependent data, time-independent data, audio data, etc.
Data Science requires knowledge of multiple disciplines. As shown in the figure, it is a combination of Mathematics and Statistics, Computer Science skills and Domain Specific Knowledge. Without a mastery of all these sub-domains, the grasp on Data Science will be incomplete.
What is Machine Learning?
Machine Learning is a subset or a part of Artificial Intelligence. It primarily involves the scientific study of algorithmic, mathematical, and statistical models which performs a specific task by analyzing data, without any explicit step-by-step instructions, by relying on patterns and inference, which is drawn from the data. This also contributes to its alias, Pattern Recognition.
Its objective is to recognize patterns in a given data and draw inferences, which allows it to perform a similar task on similar but unseen data. These two separate sets of data are known as the “Training Set” and “Testing Set” respectively.
Machine Learning primarily finds its applications in solving complex problems, which, a normal procedure oriented program cannot solve, or in places where there are too many variables that need to be explicitly programmed, which is not feasible.
As shown in the figure, Machine Learning is primarily of three types, namely: Supervised Learning, Unsupervised Learning and Reinforcement Learning.
- Supervised Learning: This is the most commonly used form of machine learning and is widely used across the industry. In fact, most of the problems that are solved by Machine Learning belong to Supervised Learning. A learning problem is known as supervised learning when the data is in the form of feature-label pairs. In other words, the algorithm is trained on data where the ground truth is known. This is learning with a teacher. Two common types of supervised learning are:
- Classification: This is a process where the dataset is categorized into discrete values or categories. For example, if the input to the algorithm is an image of a dog or a cat, ideally, a well-trained algorithm should be able to predict whether the input image is that of a dog or of a cat.
- Regression: This is a process where the dataset has continuous valued target values. That is, the output of the function is not categories, but is a continuous value. For example, algorithms that forecast the future price of the stock market would output a continuous value (like 34.84, etc.) for a given set of inputs.
- Unsupervised Learning: This is a much lesser used, but quite important learning technique. This technique is primarily used when there is unlabeled data or data without the target values mentioned. In such learning, the algorithm has to analyze the data itself and bring out insights based on certain common traits or features in the dataset. This is learning without a teacher. Two common types of unsupervised learning are:
- Clustering: Clustering is a well known unsupervised learning technique where similar data are automatically grouped together by the algorithm based on common features or traits (eg. color, values, similarity, difference, etc.).
- Dimensionality Reduction: Yet another popular unsupervised learning is dimensionality reduction. The dataset that is used for machine learning is often huge and of high dimensions (higher than three dimensions). One major problem in working with high dimensional data is data-visualization. Since we can visualize and understand up-to 3 dimensions, higher dimensional data is often difficult for human beings to interpret. In addition to this, higher dimension means more features, which in turn means a more complex model, which is often a curse for any machine learning model. The aim is to keep the simplest model that works best on a wide range of unseen data. Hence, dimensionality reduction is an important part of working with high dimensional data. One of the most common methods of dimensionality reduction is Principal Component Analysis (PCA).
- Reinforcement Learning: This is a completely different approach to “learning” when compared to the previous two categories. This particular class of learning algorithms primarily finds its applications in Game AI, Robotics and Automatic Trading Bots. Here, the machine is not provided with a huge amount of data. Instead, in a given scenario (playground) some parameters and constrictions are defined and the algorithm is let loose. The only feedback given to the algorithm is that, if it wins or performs a correct task, it is rewarded. If it loses or performs an incorrect task, it is penalized. Based on this minimal feedback, over time the algorithm learns to how to do the correct task on its own.
What is Artificial Intelligence?
Artificial Intelligence is a vast field made up of multidisciplinary subjects, which aims to artificially create “intelligence” to machines, similar to that displayed by humans and animals. The term is used to describe machines that mimic cognitive functions such as learning and problem-solving.
Artificial Intelligence can be broadly classified into three parts: Analytical AI, Human-Inspired AI, and Humanized AI.
- Analytical AI: It only has characteristics which are consistent with Cognitive Intelligence. It generates a cognitive representation of the world around it based on past experiences, which inspires future decisions.
- Human-Inspired AI: In addition to having Cognitive Intelligence, this class of AI also has Emotional Intelligence. It has a deeper understanding of human emotions in addition to Cognitive Intelligence and thus has a better understanding of the world around it. Both Cognitive Intelligence and Emotional Intelligence contributes to the decision making of Human-Inspired AI.
- Humanized AI: This is the most superior form of AI among the three. This form of AI incorporates Cognitive Intelligence, Emotional Intelligence, and Social Intelligence into its decision making. With a broader understanding of the world around it, this form of AI is able to make self-conscious and self-aware decisions and interactions with the external world.
How are they interrelated?
From the above introductions, it may seem that these fields are not related to each other. However, that is not the case. Each of these three fields is quite closely related to each other than it may seem.
If we look at Venn Diagrams, Artificial Intelligence, Machine Learning and Data Science are overlapping sets, with Machine Learning being a subset or a part of Artificial Intelligence, and Data Science having a significant chunk of it under Artificial Intelligence and Machine Learning.
Artificial Intelligence is a much broader field and it incorporates most of the other intelligence-related fields of study. Machine Learning, being a part of AI, deals with the algorithmic learning and inference based on data, and finally, Data Science is primarily based on statistics, probability theory, and has significant contribution of Machine Learning to it; of course, AI also being a part of it, since Machine Learning is indeed a subset of Artificial Intelligence.
Similarities: All of the three fields have one thing in common, Machine Learning. Each of these is heavily dependent on Machine Learning Algorithms.
In Data Science, the statistical algorithms that are used are limited to certain applications. In most cases, Data Scientists rely on Machine Learning techniques to extract inferences from data.
The current technological advancement in Artificial Intelligence is heavily based on Machine Learning. The part of AI without Machine Learning is like a car without an engine. However, without the “learning” part, Artificial Intelligence is basically Expert Systems, Search and Optimization algorithms.
Difference between the three
Even though they are significantly similar to each other, there are still a few key differences that are to be noted.
|Data Science||Machine Learning||Artificial Intelligence|
|The main goal is the analysis of data and drawing meaningful insights from it through statistical and algorithmic methods.||The main goal is to recognize the pattern in data through algorithms that “learn” from the given data and perform well on unseen data.||The main goal is to achieve “intelligence” to machines, such that they are socially, emotionally and logically aware of their surroundings.|
|Machine Learning, Statistics, and Probability theory are the core building blocks of it.||It is one of the fundamental technologies that fuel other fields. Primarily based on the fields of study like Calculus, Linear Algebra, and Deep Learning.||Machine Learning, Expert Systems, Search and Optimization Algorithms, Statistics, Probability, Linear Algebra, and Calculus are the basic building blocks of AI.|
|Very common in terms of Job Profile.||Less common in terms of Job Profile.||Very rarely do job profiles ask for Artificial Intelligence.|
|This is a commercial and research-oriented domain.||This is both a commercial and research-oriented domain.||This is more of a research-oriented domain.|
Since all the three domains are interrelated, they have some common applications and some unique to each of them. Most applications involve the use of Machine Learning in some form or the other. Even then, there are certain applications of each domain, which are unique. A few of them are listed below:
- Data Science: The applications in this domain are dependent on machine learning and mathematical algorithms, such as statistics and probability based algorithms.
- Time Series Forecasting: This is a very important application of data science and is used across the industry, primarily in the banking sector and the stock market sector. Even though there are Machine Learning based algorithms for this specific application, Data Scientists usually prefer the statistical approach.
- Recommendation Engines: This is a statistics-based approach towards recommending products or services to the user, based on data of his/her previous interests. Similar to the previous application, Machine Learning based algorithms to achieve similar or better results is also present.
- Machine Learning: The applications of this domain is nearly limitless. Every industry has some problem that can partially or fully be solved by Machine Learning techniques. Even Data Science and Artificial Intelligence roles make use of Machine Learning to solve a huge set of problems.
- Computer Vision: This is another sub-field which falls under Machine Learning and deals with visual information. This field itself finds its applications in many industries, for example, Autonomous Driving Vehicles, Medical Imaging, Autonomous Surveillance Systems, etc.
- Natural Language Processing: Similar to the previous example, this field is also self-contained sub-field of research. Natural Language Processing (NLP) or Natural Language Understanding (NLU) primarily deals with the interpretation and understanding of the meaning behind spoken or written text/language. Understanding the exact meaning of a sentence is quite difficult (even for human beings). Teaching a machine to understand the meaning behind a text is even more challenging. Few of the major applications of this sub-field are the development of intelligent chatbots, artificial voice assistants (Google Assistant, Siri, Alexa, etc.), spam detection, hate speech detection and so on.
- Artificial Intelligence: Most of the current advancements and applications in this domain is based on a sub-field of Machine Learning, known as Deep Learning. Deep Learning deals with artificially emulating the structure and function of the biological neuron. However, since few of the applications of Deep Learning have already been discussed under Machine Learning, let us look at applications of Artificial Intelligence that is not primarily dependent on Machine Learning.
- Game AI: Game AI is an interesting application of Artificial Intelligence, where the machine automatically learns to play complex games to the level where it can challenge and even win against a human being. Google’s DeepMind had developed a Game AI called AlphaGo, which outperformed and beat the human world champion in 2017. Similarly, video game AI’s have been developed to play Dota 2, flappy bird and Mario. These models are developed using several algorithms like Search and Optimization, Generative Models, Reinforcement Learning, etc.
- Search: Artificial Intelligence has found several applications in Search Engines, for example, Google and Bing Search. The method of displaying results and the order in which results are displayed are based on algorithms developed in the field of Artificial Intelligence. These applications do contain Machine Learning techniques, but their older versions were developed by algorithms like Google’s proprietary PageRank Algorithm, which were not based on “Learning”.
- Robotics: One of the major applications of Artificial Intelligence is in the field of robotics. Teaching robots to walk/run automatically (for example, Spot and Atlas) using Reinforcement Learning has been one of the biggest goals of companies like Boston Dynamics. In addition to that, humanoid robots like Sophia are a perfect example of AI being applied for Humanized AI.
Since the fields are interrelated by a significant degree, the skill-set required to master each of these fields is nearly the same and overlapping. However, there are a few skill-sets that are uniquely associated with each of them. The same has been discussed further.
- Mathematics: Each of these fields is math heavy, which means mathematics are the basic building blocks of these fields and in order to fully understand the algorithms and master them, a great math background is necessary. However, all the fields of math are not necessary for all of these. The specific fields of math that are required are discussed below:
- Linear Algebra: Since all of these fields are based on data, which comes in huge volumes of rows and columns, matrices are the easiest and most convenient method of representing and manipulating such data. Hence, a thorough knowledge of Linear Algebra and Matrix operations is necessary for all three fields.
- Calculus: Deep Learning, the sub-field of Machine Learning is heavily dependent on calculus. To be more precise, multivariate derivatives. In neural networks, backpropagation algorithms require multiple derivative calculations, which demands a thorough knowledge of calculus.
- Statistics: Since these fields deal with a huge amount of data, the knowledge of statistics is imperative. Statistical methods to deal with the selection and testing of smaller sample size with diversity is the common application for all three fields. However, statistics finds its main application in Data Science, where most of the algorithms are purely based on statistics (eg. ARIMA algorithm used for Time Series Analysis).
- Probability: Similar to the reason behind statistics, probability and the conditional probability of a certain event is the basic building block of important Machine Learning algorithms like Naive Bayes Classifier. Probability theory is also very important in understanding Data Science Algorithms.
- Computer Science: There is no doubt about either of these fields being a part of the Computer Science field. Hence, a thorough knowledge of computer science algorithms is quite necessary.
- Search and Optimization Algorithms: Fundamental Search Algorithms like Breadth-First Search (BFS), Depth-First Search (DFS), Bidirectional Search, Route Optimization Algorithms, etc. are quite important. These search and optimization algorithms find their use in the Artificial Intelligence field.
- Fuzzy Logic: Fuzzy Logic (FL) is a method of reasoning that resembles human reasoning. It imitates the way human beings make decisions. For example, making a YES or NO decision based on a certain set of events or environmental conditions. Fuzzy Logic is primarily used in Artificially Intelligent Systems.
- Basic Algorithms and Optimization: Even though this is not a necessity, but it is a good-to-have knowledge since fundamental knowledge on algorithms (searching, sorting, recursion, etc.) and optimization (space and time complexity) is necessary for any computer science related fields.
- Programming Knowledge: The implementation of any of the algorithms in these fields is through programming. Hence a thorough knowledge of programming is a necessity. Some of the most commonly used programming languages are discussed further.
- Python: One of the most commonly used programming languages for either of these fields is Python. It is used across the industry and has support for a plethora of open source libraries for Machine Learning, Deep Learning, Artificial Intelligence, and Data Science. However, programming is not just about writing code, it is about writing proper Pythonic code. This has been discussed in detail in this article: A Guide to Best Python Practices.
- R: This is the second most used programming language for such applications across the industry. R excels in statistical libraries and data visualization when compared to python. However, lacks significantly when it comes to Deep Learning libraries. Hence, R is a preferred tool for Data Scientists.
The Job Market for each of these fields is in very high demand. As a direct quote from Andrew Ng says, “AI is the new Electricity”. This is quite true as the extended field of Artificial Intelligence is at the verge of revolutionizing every industry in ways that could not be anticipated earlier.
Hence, the demand for jobs in the field of Data Science and Machine Learning is quite high. There are more job openings worldwide than the number of qualified Engineers who are eligible to fill that position. Hence, due to supply-demand constraints, the amount of compensation offered by companies for such roles exceeds any other domain.
The job scenario for each of the different domains are discussed further:
- Data Science: The number of job posting with the profile of Data Science is highest, among the three discussed domains. Data Scientists are handsomely paid for their work. Due to the blurred lines in terms of the difference between the fields, the job description of a Data Scientist ranges from Time Series Forecasting to Computer Vision. It basically covers the entire domain. For further insights on the job aspect of Data Science, the article on What is Data Science can be referred to.
- Machine Learning: Even though the number of jobs postings having the job profile as “Machine Learning Engineer” is much lesser when compared to that of a Data Scientist, it is still a significant field to consider when it comes to availability of jobs. Moreover, someone who is skilled in Machine Learning is a good candidate to consider for a Data Science role. However, unlike Data Science, Machine Learning job descriptions primarily deal with the requirements of “Learning” algorithms (including Deep Learning), and the industry ranges from Natural Language Processing to developing Recommendation Engines.
- Artificial Intelligence: Coming across job postings with profiles of “Artificial Intelligence Developer” developer is quite rare. Instead of “Artificial Intelligence”, most companies write “Data Scientists” or “Machine/Deep Learning Engineers” in the job profile. However, Artificial Intelligence Developers, in addition to getting jobs in the Machine Learning domain, mostly find jobs in Robotics and AI R&D oriented companies like Boston Dynamics, DeepMind, OpenAI, etc.
Data Science, Machine Learning and Artificial Intelligence are like the different branches of the same tree. They are highly overlapping and there is no clear boundary amongst them. They have common skill set requirements and common applications as well. They are just different names given to slightly different versions of AI.
Finally, it is worth mentioning that since there is high overlap in required skill-set, an optimally skilled Engineer is eligible to work in either of the three domains and switch domains without any major changes.