Meet Musio

Autoencoder

Goal
Autoencoder have long been proposed to tackle the problem of unsupervised learning.
In this week’s summary we have a look at their capabilities of providing a features that can be successfully used in supervised tasks and sketch their framework architecture.

Motivation
In supervised learning, back in the days, deeper architectures need some kind of pretraining of layers before the actual supervised tasked could be pursued.
Autoencoder came in handy for this and allowed to train one layer after the other and were able to find useful features for the supervised learning.

Ingredients
unsupervised learning, features, representation, encoder, decoder, denoising

Steps
Let us start by looking at the general architecture.
An autoencoder consists of two basic parts: the encoder and the decoder.
For both there are a variety of options for a specific neural network.
Depending on the the task, either a standard feed forward network, a convolution neural network or a recurrent network might be the best choice.
In the encoder part we try to build a hidden representation of the input data and in the decoder we try to reconstruct the original data point.

This architecture allows to perform a learning task on unlabeled data which allows to encode useful features of the data in the hidden representation.
In this sense the autoencoder is an approach to unsupervised learning.
Now since the encoder has to reconstruct the input data as best as possible, the encoder is forced to provide meaningful features.
If the encoder part reduces the dimensionality of the data, we speak of dense representation.
Dimensionality reduction is a very important tasks in applications as reducing the size of images or audio files.

On the other hand an encoder that increases the dimensionality towards the hidden representation is able to learn sparse representations if we regularize the activation of the layers to be close to zero.
An obvious problem occurs if we forget to apply a regularization method in the situation where we keep the dimensionlity of the input data constant throughout the encoder.
Then the encoder is allowed to simply fall back on using the identity and the representation does not contain any meaningful features.

Since autoencoders try to captures as much information as possible about the data in the hidden representation they are vulnerable to overfitting.
Their capability to generalize to new data points is limited, since the learned representations are not very robust.
A way to prevent them from overfitting is to apply a certain noise signal to the input data before encoding.
The decoder now has to learn to reconstruct the input data not only from a single hidden representation, but a fuzzy sphere around it.
Sometimes this decoding step is also called a denoising.
In general the gained representations of the data from the denoising autoencoder capture better features than the standard autoencoder.

Outlook
Autoencoder did not really play a role in the recent deep learning development after the pretraining phase in supervised learning was no longer necessary.
However recently, there is renewed interest since generative models became popular.
Applying small modifications, such as variational methods, to the autoencoder their framework can be used to generate images or audio that are similar in style to the training data.
Another remarkable progress has been achieved on the task of semi-supervised learning with only a few labeled data points using autoencoder architectures parallel to the actual classification network.

Leave a Reply