You may have heard of terms like data science and machine learning and may have thought what’s the difference between two, the two terms have very different meaning but are used interchangeably very often. In this article, We will discuss machine learning vs data science and would also explain what is deep learning and how are these connected to Big Data.
Generally, when we are finding a solution to a problem using the traditional approach, we define a step by step approach for finding the solution. For example, let's say we are building a self-driving car we will define things like if the road is empty go at fast speed, when a vehicle is in front check whether vehicle in left or right and act accordingly, so basically, a whole lot of if-else statements. But when using machine learning, this approach is flipped, we don't give steps to solve a problem instead we give our inputs and expected output and our algorithm learns on it, and develop the solution to the problem itself and that's the part where the magic happens. Because sometimes for a very complex problem it may happen that we don't even understand what is the approach that our model has built to solve the problem, we just know that it's working. We will discuss this point further when we will talk about deep learning and big data.
Machine Learning is divided into three different categories:
We have already made a small program using the supervised algorithm in my last article, if you have not read it, check it out here. In supervised learning we have our data managed so that for every input we have well-defined outcome, as in my previous article by taking into consideration four different properties of flower, we determined its species. Another example could be we are giving pictures of some animals (say cat, dog and lion) to our model and labelling the picture with the corresponding animal, now we give a picture of the dog to our model, then we will get the output as the dog. In short, we are giving our model known information and we expect our model to be trained from it.
In unsupervised learning, as opposed to supervised learning we feed unlabeled data to our model, for example, you are a professor and want to classify all your students into some groups based on their various characteristics like grades, hobbies, subjects studied. average study hours put by them per week, night owls or early risers. Say you have 60 students and you want to classify them into suitable study groups, you can provide this data to an unsupervised model that does this job for you. You will have groups of students based on their similar interests, same study hours etc. Unsupervised learning based models are used by sellers to classify their different types of customers to better cater to their needs.
In this type of learning, feedback is not given at every stage instead model gets feedback only when it achieves a particular goal. For example, if we use reinforcement based bot which can beat a player at chess, it will only be given feedback if it won the game on the other hand if it was supervised algorithm, we would be giving feedback after every move.
So, basically machine learning is a whole bunch of different algorithms divided into three categories which uses input data to evolve and find a way to solve a given problem.
Although the buzz around data science has been going around for last 4-5 years, data science has been around in the market since 90’s. Only now it has got mass attention like this. Data Science is made of two words data and science, so what is data. Data is any fact, unorganized set of things that needs to be processed to have to mean, for example, an image is data, this article is data, on the other hand, information is what we generally mean when referring to data, information is when data is processed, organized, structured or presented in a given context so as to make it useful. So data science should be actually called information science, but whatever it’s too late to make any change now. The other word in data science is science, so to call it “science” we need to consider the Science and Scientific Method definition. According to this, Data Science is not only about the practical or empirical methods, it needs scientific foundations. Data science consists of wide variety of skill set:
Data science is having knowledge of different fields. Topics/tools that a person needs to understand or have some knowledge when working with Data Science:
Now we have talked about data science and machine learning let us know what is deep learning.
A great video explaining the difference between machine learning and data science.
I highly recommend to watch it.
Deep learning is a sub field of machine learning algorithms which uses Artificial Neural Network (ANN) to solve a problem using trial and error, much like a human brain, like humans learn and adapt according to their environment, like that ANN start with some fixed weights and they are changed according to the output produced. I will discuss deep learning in my future articles so don’t worry if you still have not got a clear understanding of deep learning, we will implement a neural network in python. Terms like data analytics, data mining etc and how they contrast with machine learning and data science is explained here.
Now the last question is what is Big data and how is it related to data science and machine learning. Data collected through various sources is complex and messy, these sources may include IOT devices or machines and human activities online, since this data is too complex and large to be maintained through traditional tools we have moved to Big data which basically includes advanced new tools to solve the problems posed by complex and large datasets.
Since machine learning algorithms depend on data to evolve and find the solution to problems, so a data scientist should have some knowledge of Hadoop and such tools.
I hope you like this article if you want to discuss something or want to know about something particular then drop a comment. Happy Learning.