They say a picture speaks 1000 words. This alone should be enough to convince you of the importance of creating graphs in python.
When trying to communicate the insights you have gained from your machine learning project, a graph can go a long way.
To be a productive data scientist, you need to get good at visually representing data.
Don’t worry, though, using some simple libraries for Python, you can easily create beautiful charts that will delight your readers.
I’m talking all the colours and in line plots that help your data come to life gloriously.
Let’s get started by thinking about the different charts you want to create.
Different types of graphs in python
Some things to consider:
- Type of data
- What you want to understand
- Number of features
Different types of visualization
Now you know why you need to get good at making graphs in python, let’s talk about how you do it.
2D Graphs in python
There are plenty of different options when it comes to 2-D graphs in python.
Depending on what your data you are trying to visualize, you can use different 2-D visualizations to help show your data in the best light.
Here are just a few examples of some of the different graphs in python you can use depending on what you are trying to show.
A histogram is excellent for showing a simple visual representation of your data. You will be able to see between discrete categories of data the number of items in each category.
Many people use histograms to get a simple understanding of the data when they first load it into pandas or Jupyter notebooks and perform straightforward checks on it.
Below is an example of a histogram in action.
A scatter graph is great for showing how data changes depending and how different data points are related to one another.
Scatter graphs are one of the most current widely used graphs in python.
Its popularity stems from the fact that it’s a very flexible graph and you can show a lot with. Below are a few examples of scatter graphs that you can use yourselves.
The heatmap is a compelling data visualization.
You can use heat maps to show how popular different data points are within an experimental space.
For example, in object detection heatmaps can show which parts of the image are the most useful to the algorithm when determining what class an object is in.
Heatmaps are a great way of showing the importance of different variables, as well as how common one item is compared to another.
Below is an example of a heat map generated from an object detection algorithm.
This heat map shows which area the algorithm has used to identify the object, in this case, a dog.
A box plot is similar to a scatter graph but instead of showing all of the points, it shows a box. This box gives you information about the distribution of your data at a certain point.
- The line inside the box represents the mean.
- The size of the box represents the interquartile range.
- The lines either side of the box represent the full range of the data
Below is an example of a box plot.
A pie chart is another straightforward graphs in python that can be used to compare different proportions of variables. The challenge with the pie chart is that they are difficult to read accurately.
Pie charts are only suitable to give a quick overview of data.
Despite this, it can be an excellent way of showing data to people in a simple way.
Below is an example of a pie chart.
A correlation matrix is a very common data visualization within data science. These matrices are generated to show how two variables are correlated to one another.
This correlation score can be very useful when you were trying to understand the relative importance of different inputs.
Correlation matrices also help you to remove insignificant inputs before making predictions.
We will discuss later why you might want to do this data removal to help your algorithms run more efficiently.
Below is an example of a couple of different correlation matrices and how they have been used.
3D graphs in python
3-D graphs in python can be useful in certain circumstances.
For example, when you are trying to show the correlation of more than two variables to one another.
Here is an example of a 3-D visualization showing the experimental space and looking for local minima. A local minimum is essential to find when you are trying to optimize your algorithm. Looking at the area in 3-D can help you say which of these local minima is your best option.
A word of warning though 3-D visualization can make things more challenging to interpret.
It is only recommended to use these in certain circumstances where it makes sense. It is not recommended to use these just to make your projects look more exciting or technical.
I know it’s tempting but trust me in most situations a 2-D visualization will do the job much better.
The final example of data visualization is we will look at is animations. This is not really a type of viewing as it will basically be showing 2-D graphs in python updating over time.
However, they can look pretty cool.
It is also great to use animation to help explain how different variables, when altered, can change the results of your algorithm.
For example, if you were showing the impact of updating weights and biases following gradient descent on the results of your algorithm.
With an animation, you can see how over time the algorithm is updating and optimizing itself to make better predictions.
You use animation graphs in python in these circumstances where the results change over time. In this case, you use animations to better help someone to understand them.
Pros and cons of different data visualizations
|Type of Visualisation||Pro||Con|
|2D||Easy to interpret and flexible||A limited number of variables can be included|
|3D||Great for showing the experimental space across multiple variables|| Can be tough to interpret without proper explanations.|
|Animation||An effortless way to explain updating results||Only suitable for certain types of data series|
Feature reduction and visualizing multiple variables
Another critical part of data visualization is feature reduction.
Bear with me here – I’m going to explain this by making you do a visualization!
It’s straightforward to imagine something in 2-D. You can also imagine things in 3-D without too much difficulty. Or at least I hope you can.
In 3-D visualization is the maximum number of features you were looking at in a jar do you cases is three. If you add color as another layer of identifying a different feature you can get to 4. Beyond that, it is challenging for you to imagine what a feature space looks like.
This is why when you are looking at creating graphs in python, you need to understand which are the most essential features. These are the ones that you want to use when you are imagining the data.
It can also make it a lot easier to interpret data if you are removing features that are not important. This is to help make sure that you are only predicting data that has an impact on the result.
This is important not only for visualizing but also for making sure that your algorithms are running efficiently.
The primary way that this is done is through feature dimensionality reduction.
What is dimensionality reduction?
Dimensionality reduction is a systematic process by which you identify which features of your dataset are not crucial for making predictions of the results.
For now, all you need to know is that the most common way that you will complete future reduction on your dataset is using the PCA algorithm. PCA stands for printable component analysis.
The PCA algorithm will automatically identify which features are essential of your date is it and remove those that are not.
Ok so you have the right number of features, now you need to make attractive graphs in python.
Making your graphs in python more visually appealing
All of the data visualizations is that we have covered here today can be implemented using the Matplot library in Python.
These simple graphs in python will be more than enough to help you get a basic understanding of your data before running predictive analysis.
However, when it comes to graphs in python, sometimes okay is not good enough. I know making something look beautiful may seem a little superficial.
People can stick up their noses up at this idea, but these things matter.
If you are trying to get people interested in your data, you want to make sure that your visualizations look as good as possible. Don’t worry, you’re not going to have to invest heavily to make your graphs beautiful.
Instead, you can just use the Seaborn library.
Seaborn makes your graphs in python much more attractive. It allows you to as more beautiful colours and just presents the data in a much more visually appealing way.
And then you have a lot more control over the different aspects of your visualizations making them easier to interpret.
More importantly, it’s no more difficult to implement data visualization is to your projects using Seaborn; than it is using Matplot.
Moreover, as with Matplot, Seaborn library is a built-in python library. Therefore you don’t have to install anything new to use it.
The below image shows the difference between Matplot and Seabourn side by side.
Which do you prefer?
How to implement in different plots in python
In this tutorial section, I will share the code and output graph in python from a variety of different visualizations I created for a data science project.
These examples will show you how you can use different graphs in python to display your data.
Correlation Matrix/Heat Map
Now it’s your turn!
There are plenty of different graphs in python for you to try out.
In this tutorial we have covered:
- Why you need to use data visualizations
- Different ways to visualize data
- Examples of graphs in python (with the code)
Now it’s your turn to have a go. Which graph in python are you most excited by? Let me know in the comments.