Find your machine learning mojo!

How to Understand and Implement Classification Algorithms

Classification algorithms are a powerful tool in any machine learning engineer’s arsenal.

Algorithms developed using classification machine learning has been shown to be a robust way to solve many real-world problems. Notably, it is used in research to predict the likelihood of a person developing a disease or cancer.

There are many different types of classification machine learning algorithms to choose from.

This article outlines the different types of classification algorithm, how they work and then how to implement them using python.

But first, when to use classification analysis in machine learning.

Ready to get started with Machine Learning Algorithms? Try the FREE Bootcamp

When is Classification analysis used in machine learning?

Classification algorithms, like regression algorithms,  are used in supervised machine learning on continuous data.

When we talk about supervised learning in ML, what we mean is that we have a specific set of training data for the algorithm to learn from. This training data contains all the inputs as well as the output value of an actual incident in the data.

For example, will this person develop diabetes? Yes or No. In classification analysis, the labeled training data set will have a sample set of people and their characteristics alongside whether or not they developed diabetes.

This training data is used to teach the machine how different characteristics of a person’s genetics or lifestyle contribute to whether or not they would get diabetes. Based on these inputs (features) the model will then predict, the probability they will get diabetes.

Depending on the type of classification algorithm used, the probability of each algorithm will be calculated differently.

Coding Machine Learning GIF - Find & Share on GIPHY

Classification analysis is an example of a problem relying on a non-continuous dataset, otherwise known as discrete data.

It is the opposite of the problems involving continuous data which you would use regression analysis to understand solutions.

Want to understand more about regression analysis in more detail? Check out this post.

What are the different types of classification algorithms?

There are many different types of analysis you can run using classification algorithms.

1/ Logistic Regression

Logistic regression is a regression algorithm used to assign a probability of a point being in one class or the other.

The logistic regression algorithm creates a curve is plotted as a curve of probabilities that are dependant on the input variables. This curve formed goes between the true and false points. Each probability or data point prediction is calculated as an output of a logarithmic calculation, hence logistic regression.

Logistic regression is a linear algorithm. Points that fall above the 0.5 probability are assigned a true classification, and those below are assigned false.

logistic regression curve

Source – wikipedia logistic regression plot

When you plot logistic regression predictions, you will see a split between the true and false zones in your dataset. These areas are separated by a straight line. Hence Logistic regression is a linear algorithm despite plotting a curve.

Now that took me a little while to get my head around!

Read more about logistic regression here.

POPULAR PRODUCTS

Artificially Intelligent Claire | Machine Learning Unisex Crew Neck T-Shirt
ARTIFICIALLY INTELLIGENT CLAIRE | 404 COFFEE UNISEX CREW NECK T-SHIRT
ARTIFICIALLY INTELLIGENT CLAIRE | PROGRAMMER HARRY UNISEX CREW NECK T-SHIRT
Artificially Intelligent Claire | I Need A-rrays 11oz Mug
Artificially Intelligent Claire | Programmer Harry 11oz Mug
Artificially Intelligent Claire | Feel So Empty 11oz Mug
Artificially Intelligent Claire | Machine Learning Unisex Crew Neck T-Shirt
ARTIFICIALLY INTELLIGENT CLAIRE | 404 COFFEE UNISEX CREW NECK T-SHIRT
ARTIFICIALLY INTELLIGENT CLAIRE | PROGRAMMER HARRY UNISEX CREW NECK T-SHIRT
Artificially Intelligent Claire | I Need A-rrays 11oz Mug
Artificially Intelligent Claire | Programmer Harry 11oz Mug
Artificially Intelligent Claire | Feel So Empty 11oz Mug
Artificially Intelligent Claire | Machine Learning Unisex Crew Neck T-Shirt
ARTIFICIALLY INTELLIGENT CLAIRE | 404 COFFEE UNISEX CREW NECK T-SHIRT
ARTIFICIALLY INTELLIGENT CLAIRE | PROGRAMMER HARRY UNISEX CREW NECK T-SHIRT
Artificially Intelligent Claire | I Need A-rrays 11oz Mug
Artificially Intelligent Claire | Programmer Harry 11oz Mug
Artificially Intelligent Claire | Feel So Empty 11oz Mug

2/ K-Nearest Neighbours (k-NN)

The K-nearest neighbors calculation uses the points closest to the data point you are trying to predict to assign a probability to it being true or false.

You have an option to choose the number of neighbors you analyze for each new data point. However, typically people look at 5.

k-NN Classification Analysis

Source – wikipedia 5k-NN plots

The algorithm will assign the likelihood of an item to fit into each class. Once this is complete, you will obtain a decision barrier as shown in the diagram above.

Read more about k-NN here

Artificially Intelligent Claire | Machine Learning Unisex Crew Neck T-Shirt
ARTIFICIALLY INTELLIGENT CLAIRE | 404 COFFEE UNISEX CREW NECK T-SHIRT
ARTIFICIALLY INTELLIGENT CLAIRE | PROGRAMMER HARRY UNISEX CREW NECK T-SHIRT
Artificially Intelligent Claire | I Need A-rrays 11oz Mug
Artificially Intelligent Claire | Programmer Harry 11oz Mug
Artificially Intelligent Claire | Feel So Empty 11oz Mug
Artificially Intelligent Claire | Machine Learning Unisex Crew Neck T-Shirt
ARTIFICIALLY INTELLIGENT CLAIRE | 404 COFFEE UNISEX CREW NECK T-SHIRT
ARTIFICIALLY INTELLIGENT CLAIRE | PROGRAMMER HARRY UNISEX CREW NECK T-SHIRT
Artificially Intelligent Claire | I Need A-rrays 11oz Mug
Artificially Intelligent Claire | Programmer Harry 11oz Mug
Artificially Intelligent Claire | Feel So Empty 11oz Mug
Artificially Intelligent Claire | Machine Learning Unisex Crew Neck T-Shirt
ARTIFICIALLY INTELLIGENT CLAIRE | 404 COFFEE UNISEX CREW NECK T-SHIRT
ARTIFICIALLY INTELLIGENT CLAIRE | PROGRAMMER HARRY UNISEX CREW NECK T-SHIRT
Artificially Intelligent Claire | I Need A-rrays 11oz Mug
Artificially Intelligent Claire | Programmer Harry 11oz Mug
Artificially Intelligent Claire | Feel So Empty 11oz Mug

3/ Support Vector Machine Learning

Support vector classification machine learning attempts to maximize the distance between points in different classes.

It uses two points as the support vectors and creates a decision boundary between them.

As with logistic regression, this creates a linear decision boundary.

Support Vector Classification

Source – wikipedia Linear SVM plot

This graph above shows 3 options for decision boundaries using SVM

H3 is the best option from the 3 lines shown because:

  • It maximises the distance between data points and the line
  • It has the least incorrect assignments

3.1/ Kernel Support Vector Learning

But what happens is your data set cannot be split using a straight line? Well, then you need to look into Kernel SVM.

Hold up, what on earth is a kernel?

Essentially, a kernel is a mathematical trick used to transform your data points into a higher dimension.

When I talk about dimensions I am using the word in the mathematical sense – we’re not sending your data out into the space-time continuum!

As an example, if you have data on 1-axis, it is in the 1-dimension. Data plotted on 2-axis (i.e. x and y) is 2-dimensions, on 3-axis (x,y and z) is 3-dimensional etc etc.

Transforming the data into a higher dimension is useful as in a higher dimension it is possible that you can separate it using a straight line or plane between two vectors.

Kernel SVM Analysis

Source: wikipedia SVM

Depending on the shape of your data you can use different kernel calculations to separate the points. Examples of the different kernels include the Gaussian (as shown above), sigmoid and the polynomial kernels.

After you have created the plane, you transfer the dataset back to the original dimension and will have established a decision boundary for prediction.

Read more about SVM and Kernels here

4/ Naive Bayes

The Naive Bayes algorithm attempts to assign a classify to a data point by looking at the class of data points with similar features.

It is known as naive because it assumes that data points with similar features will be the same class.

To calculate the probability, the Naive Bayes classifier looks at:

  • The probability of having the features of a data point
  • The likelihood that the data point falls into a class out of the total data set
  • The chance that a point with characteristics similar to that data point is the class  

Once the Bayes theorem calculation is computed, you can assign the data point into a class.

Read more about Naive Bayes here

5/ Decision Tree Classification

Decision tree classification splits the data into discrete sections that are arrived at following a set of binary classification decisions.

A prediction is then made taking the average value of all data in the section where the new data point lands.

Learn more about Decision Tree Classification here

6/ Random Forest Classification

Uses the same process as decision tree classification above but creates multiple decision trees and then makes a prediction for your data point based on the average projection of all trees created for the forest.

You have the option to choose the number of trees you create. Typically, however, people use 5.

Read more about Random Forest Classification here

What is overfitting? And why is it an issue?

 Overfitting occurs when you fit an algorithm trying to classify all of the training data correctly.

It is a problem because it will work the classifier to account for outliers in your dataset that might not be relevant when you try to use the model on new data.

When you overfit the data to the training set, then it will not predict real-world data as well.

Ran Below is an example of a Random Forest Classifier that is overfitted to the training data set.

Random Forest Classifier training set
Random Forest Classifier on test set dat

How do you evaluate which model to use?

There are two ways to evaluate your classification model:

  1. Confusion Matrix
  2. Cap curve analysis

Confusion Matrix

The confusion matrix analyses the number of points at which your model prediction is ‘confused’ or predicts incorrectly.

Fewer incorrect predictions = better model

TrueFalse
True207
False26

Example of a confusion matrix showing 9 total incorrect predictions

Cap Curve Analysis

Cap curve analysis looks at the difference between what you would expect in results from a perfect model and a random model. The model you have created is then tested relative to what you would expect from an ideal model.

There are two ways to complete cap-curve analysis.

  1. Assess the area under the curve of your model vs. a perfect model – this way can be quite a time-consuming analysis to compute
  2. Look at the 50% line on the X-axis and see where this equates to on the Y-axis.
Cap Curve Analysis

Image Source = Machine Learning A-Z: Hands on Python and R Machine Learning; Cap Curve Analysis

The value on the Y-axis in Cap Curve analysis using the 50% technique is then used to evaluate the classification algorithm model performance:

  • <60% – model is rubbish, try again
  • 70-80% – model is good
  • 80-90% – model is excellent
  • 90%+ – model is too good, and you probably have cases of overfitting – try again

What are the differences between classification and regression analysis?

SectionClassificationRegression
Data typeDiscreteContinuous
Problems used onIdentifying likelihood a data point sits in one option or anotherLooking to predict a value based on a number of features
How to evaluate the model?Confusion matrix and Cap-curve analysisP-values and Adjusted R-squared
Classification Analysis

How do you implement classification algorithms using python?

In this section, I have provided links to the documentation in Scikit-Learn for implementing regression.

Before you do any type of data analysis using classification algorithms however you need to clean your data.

This process is called data pre-processing and is essential for ensuring you get a good output from your algorithm.

Some steps to follow are:

  • Check for outliers in the data that could skew the results
  • Replace missing data points with the average value for that data point (this is one option generally seen as better than removing that data point entirely)
  • Feature scaling: If you have input variables on very different scales, you may need to scale them to ensure you don’t get impact bias for one variable

The Youtube tutorial videos #37, 38 and 39 cover some techniques to do this here (link)

Implementing regression algorithms in python using the Scikit-Learn module:

The first step is to import the classification module, i.e. Naive Bayes.

importing classification modules

Then depending on the type of classification algorithm you need below are links to the documentation:

To create the graphs of the decision boundary, you can use the following code:

python code to create decision boundary graph

So there you have it. That is how to implement classification algorithms in python.

This is a powerful analysis technique that can be used on multiple machine learning problems. What problems will you use it to solve?

Ready to get started with Machine Learning Algorithms? Try the FREE Bootcamp

Advertising Disclosure: I an affiliate of Udemy and may be compensated in exchange for clicking on the links posted on this website. I only advertise for course I have found valuable and think will help you too.

If you have found this content helpful, I recommend the course linked below which gave me a baseline understanding of the materials and python code shared here.

Machine Learning A-Z: Hands-On Python & R In Data Science

Follow:

Leave a Reply

Your email address will not be published.

%d bloggers like this: