Restricted Boltzmann Machines (RBM) are an example of unsupervised deep learning algorithms that are applied in recommendation systems.
Recommendation systems are an area of machine learning that many people, regardless of their technical background, will recognise.
You see the impact of these systems everywhere!
A very basic example of a recommendation system is the apriori algorithm. However, deep learning allows recommendation technology to become much more sophisticated.
There are several deep learning algorithms currently being used within recommendation systems. However, RBMs in particular seems to be of great interest to people as they start practising deep learning.
The number one question I have received over the last few months on deep learning is how to implement RBMs using python.
Today I am going to go into how to create your own simple RBM from scratch using python and PyTorch.
But before I start I want to make sure we all understand the theory behind Boltzmann Machines and how they work.
Trust me this will help you a lot when trying to understand the code!
The joy of something familiar
Finally, my chemistry masters degree is having some use in this journey into data science!
Forgive me a moment for taking time out to celebrate the joy of familiarity but I do have a point I promise!
Most people in the machine learning space find Boltzmann distribution models terrifying at first pass. I, on the other hand, was delighted to finally see something I recognized!
It’s funny how perspective can change your approach.
Despite these algorithms being one of the more challenging to understand, I actually found I was able to pick up the theory fairly easily.
And I know what you’re thinking, it’s because I studied them in Chemistry.
Trust me when I tell you that didn’t help me at all.
Aside from a name and one key concept I will explain to you shortly, there is nothing in the high-level theory that would give people with a chemistry degree an advantage.
However, because I entered into the learning process believing I already had the capacity to fully understand Boltzmann and any machine created in his name, I found learning this complex topic much more simple than expected.
I invite you to forget what you may have heard about how complex Boltzmann Machines are and join me in this positive headspace.
Are you ready?
Ok let’s dive in – You got this!
What is a Boltzmann Machine?
The main thing you need to know to understand a Boltzmann Machine is that it is an energy-based model.
This energy-based approach is where the overlap with chemistry comes into play.
In chemistry Boltzmann’s constant is used when looking at the relative energy of particles in a system.
The particles will always aim to be at a distance between one another that minimizes the energy of the overall system. One thing to remember is that particles themselves have an electrical charge, this charge brings energy into the system.
To minimize the energy in the total system, you want to maximize the distance between the repelling charges present in the particles.
I am not going to go too deep into the theory behind entropy (the value associated with how ordered or disordered a system is), energy and Boltzmann.
All you need to understand is that in an energy based system, the game aims to minimize energy as quickly as possible.
Now that you understand the main thing about Boltzmann machines as an energy-based system, we can move onto what they do.
Unsupervised feature detection
Ok so Boltzmann machines are energy based.
How does this relate to deep learning?
I’m so glad you asked!
I am actually going to dive into some more science to explain this one.
(Don’t worry this is not going to be a super technical mathematical derivation)
Think about the planet.
Now the majority of us acknowledge that global warming is a very real threat to our planet.
Think about some of the reasons for global warming.
Some that come to my mind are carbon dioxide production, human activity, nitrous oxide production etc etc etc
But we don’t know for sure all the factors causing global warming.
There could be an infinite number of factors causing global warming. Some of these causes will be more obvious than others. Some factors will be significant and others will not.
The fact of the matter is that we do not fully understand it.
Bringing it back to Boltzmann Machines
What a Boltzmann machine does is take the input you give it and then try to identify all of the features impacting or related to these features.
As it tries to solve the problem, minimizing the energy of the system, it will generate features via unsupervised learning processes.
Some of these features generated we might be able to understand, but others will be a mystery to us.
What you need to understand is that not understanding all the features is fine. You can just sit back and let the algorithm do its job.
Ok great, so what does this system look like?
The structure of a Boltzmann Machine
The below image shows a schematic representation of a BM. Do you notice anything weird?
Did you spot it?
There is no output layer.
A neural network without an output layer??? What does it mean?
Boltzmann Machines are neural networks without a direction.
This means that information doesn’t flow through the neural network one way.
The data (and energy) can move through the network, creating hidden nodes representing different features, in any way it likes.
Once you have given the BM the input data, this is immediately absorbed into the system and becomes part of it. After the algorithm is initialized, all of the data and features are treated equally.
What I mean by this is the input layer becomes just another set of features to be used by the system as it likes.
What is the biggest challenge associated with BM in practice?
You can imagine how powerful these Boltzmann Machines can be.
They are able to create data (features) from nothing and become self-sustaining,
However, there is one slight problem.
Try to imagine again all the potential factors that could impact global warming. After a few obvious ones, it starts to get a bit complicated.
Keep thinking about it long enough and you will find yourself going deep down into a rabbit hole.
This rabbit hole concept is a problem for BMs as to model all of the factors would require A LOT of processing power.
From a practical point of view, they are just not scalable.
We need a solution to this problem – and luckily someone has come up with one!
What are restricted Boltzmann Machines? And why do you need them?
A Restricted Boltzmann machine (or RBM) is the solution to this problem
Hallelujah! All hail!
So what is an RBM?
A Restricted Boltzmann machine is a stochastic artificial neural network.
What that means is that it is an artificial neural network that works by introducing random variations into the network to try and minimize the energy.
This process of introducing the variations and looking for the minima is known as stochastic gradient descent.
Similar to the BMs we discussed previously, RBMs do not have one direction the data moves in.
Each of the neurons in the system has a binary value that is activated, or not, depending on the values of the other neurons in the system.
The key difference is the amount of neurons and levels created in an RBM.
Look at the structure of a restricted Boltzmann machine below. You will see it looks a lot simpler.
As the algorithm works it goes through the system updating the values of the neurons via the activation function, the energy of the system goes down.
Eventually, the algorithm will converge on a minimized value and you’re done.
An overview of the implementation of RBMs
Ok, I hope you’ve managed to stick with me so far as we explore the theory behind restricted Boltzmann machines.
Now it’s time for the fun stuff – implementation!
Broadly speaking there are 6 steps to creating an RBM.
- Obtaining the data and preprocessing
- Shaping the data for the neural network
- Converting the data to Torch tensors (to be used with PyTorch)
- Creating the architecture of the Neural Network. You will need to build a class to do this that can:
- Build out the activation function
- Calculates the probability of a neuron being activated
- Updates weights and biases
- Training the RBM
- Testing the RBM
That’s a broad overview of the different steps involved, now it’s time to look at the code!
Implementation of RBMs step by step
One of the common implementations of RBMs is in recommendation systems.
In the code below, I have added notes to source code worked with within the Udemy Deep Learning A-Z course.
Advertising Disclosure: I an affiliate of Udemy and may be compensated in exchange for clicking on the links posted on this website. I only advertise for course I have found valuable and think will help you too. If you have found this content helpful, I recommend the course linked below which gave me a baseline understanding of the materials and python code shared here.
Step 1: Obtaining the data and preprocessing
As always the first step in the process of creating any machine learning model is to clean your data!
Below is a YouTube tutorial series that runs through the basic of steps of data cleaning using pandas in python.
Don’t skip this step or you will be sorry when your model is junk 🙁
You also need to import the required libraries for your model (see the image below)
Step 2: Shaping the Neural Network
The second step is to shape your data so that it can be read by the neural network.
Restricted Boltzmann Machines like to have their input data in the form of an 2D array. For example, if you are looking to make a recommendation system for movies you can have an array with movies as columns and users as the rows.
Step 3: Converting the data to Torch tensors
Now it is time to get your data ready for use in PyTorch.
You will need to check your data type before doing this as Torch tensors are multidimensional matrixes that only contain one data type. If you do not use the correct one you may get an error.
Step 4: Creating the architecture of the Neural Network
Ok so this is where it gets a bit more complicated.
In order to generate an RBM you will need to create your own class.
If you have never done this before then I recommend watching the below tutorial to get familiar with the process.
The class that required for a Restricted Boltzmann Machine is created to do the following:
- Build out the activation function
- Calculates the probability of a neuron being activated
- Updates weights and biases
Below is an annotated example from the training I completed in the Deep Learning A-Z™: Hands-On Artificial Neural Networks course.
You can see that there is a lot of math happening here. Check my notes to understand what is happening.
I also recommend trying the course for yourself if you do not understand – it is well covered there and often available for $9.99!
Step 5: Training the RBM
Congratulations you made it through creating your own class!
In the below code is an example of how to:
- Initialise the epochs
- Creating batches for training
- Updating the weights
Step 6: Testing the RBM
The final step of training the Boltzmann machine is to test the algorithm on new data.
In this scenario you can copy down a lot of the code from training the RBM. Then you need to update it so that you are testing on one batch with all the data, and removing redundant calculations.
Other types of Boltzmann Machine to be aware of
|Algorithm||What the model it does|
|Deep BM||An unsupervised, probabilistic, generative model that is like the Boltzmann Machine in that it is un-directional. However, in a deep Boltzmann, the structure is closer to the RBM but with multiple hidden layers.|
|Stacked BM||Multiple layers formed of stacked RBMs where the system flows up and then updates weights going back down the model.|
|Currently, in research development, this Boltzmann machine would leverage a structure similar to the quantum Boltzmann distribution.|
Bonus Code: Advanced Practice code
Full disclosure I have not done this experiment at the time of writing.
Below is a link to a practice project for more advanced Boltzmann Machine generation. When you’re feeling brave give it a go!
Everything we have learned about Restricted Boltzmann Machines
I hope you can now see how Boltzmann Machines are not as scary as you might have imagined!
The important thing to understand are:
- They are energy based models – looking to minimise the energy
- Boltzmann Machines are unsupervised deep learning techniques
- Restricted Boltzmann Machines are more practical to implement and require less processing
- You can implement an RBM by creating a class to run the algorithm
- There are multiple advanced and powerful types of Boltzmann Machines
Most important of all, you can make an RBM even if you don’t have a lot of experience!!