“A computer program is said to learn from experience E with respect to some task
T and some performance measure P, if its performance on T, as measured by P,
improves with experience E.”
- Tom Mitchell, 1997
Your spam filter is a Machine Learning program that, given samples of spam
emails (e.g., flagged by users) and samples of regular (non-spam, also called
“ham”) emails, can learn to flag spam. The examples that the system uses to find
out are called the training set. Each training example is named a training instance
(or sample). In this case, the task T is to flag spam for brand spanking new emails,
the experience E is that the training data, and therefore the performance measure P
must be defined; for example, you'll use the ratio of correctly classified emails.
This particular performance measure is named accuracy, and it's often utilized in
classification tasks.This was an example of a classification problem
Machine Learning systems are often classified consistent with the quantity
and sort of supervision they get during training. There are four major
categories: supervised learning, unsupervised learning, semi-supervised
learning, and Reinforcement Learning.
SUPERVISED LEARNING
In supervised learning, the training set you feed to the algorithm includes
the desired solutions, called labels. In the case of a spam filter the labels
are “spam” and “not spam”.
A typical supervised learning task is classification. An example of a
classification problem is "spam filter", which is actually classifying if a mail
is spam or not. Classification includes predicting discrete values.
Another typical task is to predict a target numeric value or continuous
values, such as the price of a car, given a set of features (mileage, age,
brand, etc.) called predictors. This sort of task is called regression. To train
the system, you would like to offer it many samples of cars, including both
their predictors and their labels i.e., their prices.
We use different algorithms to implement classification and regression.
Here are a number of the foremost important supervised learning
algorithms :
❖ K-Nearest Neighbors
❖ Linear Regression
❖ Logistic Regression using different kernel function
❖ Support Vector Machines (SVMs)
❖ Decision Trees and Random Forests Classifier or Regressors
❖ Neural networks using ANN (Artificial Neural Networks)
UNSUPERVISED LEARNING
In unsupervised learning, as you might guess, the training data is
unlabeled. The system tries to learn without a teacher.
The algorithm tries to derive knowledge from a general input without the
assistance of a group of pre-classified examples that are prompt to build
descriptive models. A typical example of the application of these algorithms
is found in search engines.
For example, say you have a lot of data about your blog’s visitors. You may
want to run a clustering algorithm to undertake to detect groups of
comparable visitors .At no point does one tell the algorithm which A visitor
belongs to: it finds those connections without your help. For example, it'd
notice that 40% of your visitors are males who love comic books and
usually read your blog within the evening, while 20% are young sci-fi lovers
who visit during the weekends. If you employ a hierarchical clustering
algorithm, it's going to also subdivide each group into smaller groups. This
may help you target your posts for each group.
Reinforcement learning
The algorithm is in a position to find out counting on the changes that occur
in the environment in which it is performed. In fact, since every action has
some effect on the environment concerned, the algorithm is driven by the
same feedback environment. Some of these algorithms are used in speech
or text recognition.
How to build a machine learning model ?
➢ Web Scraping data or gathering data from Kaggle or UCI repository.
➢ Reading the data using Pandas or Numpy
➢ Analysing data using data visualization and plots with the help of
awesome python libraries like Matplotlib and seaborn.
➢ Identifying correlation using a heatmap and features to use in
predictions.
➢ Creating the final features and target Dataframe.
➢ Anticipating the best models for the dataset and importing models
from Scikit-learn.
➢ Fitting the features into the model.
➢ Creating predictions.
➢ Analysing accuracy and loss.
➢ Checking for underfitting and overfitting.