DBNs, RBM seminar
We had an AKA AI seminar last Friday. We had our AI centers in the US and South Korea come together and Andy lead the seminar. The summary of the seminar is as follows:
RBM stands for Restricted Boltzmann Machine. It can be seen as a system in DBNs (Deep Belief Networks), which are multilayered nerve networks in an MLP form. We should understand the following concepts in order to understand DBNs and RBM.
i. MLP (Multilayer Perceptron)
ii. Stochastic Modeling
iii. EBM (Energy Based Model)
All three of the concepts mentioned above are of importance in machine learning and constructing theoretical backgrounds for DBNs and RBM. However, the three are basically independent from each other. Short explanations of the three concepts are as follows:
Background : MLP is an abbreviation of Multilayer Perceptron and is also known as a Multilayered nerve system. A problem occurred (an XoR problem) in the former single (layered) neuroblast, wherein the step fuction or hyper tangent/sigmoid function could not call nonlinear functions. Since MLP is a “Multilayered” nerve system (formed with three layers made up of input, output, and a hidden layer) which shows the output in a linear combination of transcendental functions, it can estimate every function in every format in an allowable tolerance.
Theory : A perceptron in each input layer gets the input data and sends it to the hidden layer. In this process, the perceptron calculates the data with two random parameters called ‘weight’ and ‘bias’ while a perceptron in the hidden layer also calculates the data with two random parameters called ‘weight’ and ‘bias’ to send to the output layer.
Machine Learning: A machine learns while the parameters in each layer modify themselves so the labelled input data fits in with the right output data..
ii. Stochastic Modeling
EBM means an energy based model as it literally is. It is not a specific algorithm, but a method of how to look at the machine learning process. We define it as E(Xi,Yi) when there is an input data Xi, output data set, and desired output data Yi which we call an ‘Energy function’.
In EBM based machine learning, it means ‘training’ to make E(Xi,Yi)>E(Xi,Yi’) happen. In this process, it is normal to define the Loss Function by modifying the parameter to make it as small as possible.
RBM is a model mixed up with 3 different backgrounds. It is a probability model using the Boltzmann probability distribution. DBNs are a kind of EBM which are structured with multi latyers and are conceptionally similar to MLP. The learning algorithm of an RBM is categorized into two levels.
In the case of MLP mentioned above, machine learning happens the other way round by using ‘back propagation’ to put the training data set in reverse order into the output, hidden, and input layer. However, in the case of MLP, which has one hidden layer, it can estimate almost every function in a given error range. Therefore, overfitting can occur in deep neural networks. That’s why we need the pre-training course. In the pre-training course, MLP uses a technique called MCMC(Markov Chain Monte Carlo), and in the special case of the Metropolis algorithm, it uses Gibbs Sampling. The pre-training of the data itself is an unsupervised process with no labelling value, and after this process fine tuning goes on which is known as supervised learning.