Meet Musio

Sequence-to-Sequence

In this summary I like to provide a rough overview of Sequence-To-Sequence neural network architectures and what purposes these serve.

Motivation
A key observation when dealing with neural networks is that these can only handle objects of a fixed size.
This means that the architecture has to be adopted if sequences like sentences should be process-able.
The same problems with objects of variable length also appear on the dialog level, where a certain number of utterances and responses string together. Besides dialog modeling, speech recognition and machine translation demand for advanced neural networks.

Ingredients
Deep neural network, hidden layer, recurrent neural network, encoder, decoder, LSTM, back-propagation, word embedding, sentence embedding

Steps
As already stated standard neural networks can not deal with sequences of variable lengths.
Moreover they have no knowledge of the previous input.
However this is of big importance for understanding sentences for example.

For this reasons, altered neural networks architectures where proposed and pursued.
A first step are recurrent neural networks which can process variable sequences of fixed size objects, as words in sentences.
They also solve the problem of keeping knowledge about the previous input by passing a state in the hidden layer.
For certain mathematical shortcomings of standard neurons, this does not allow for endless stretching back memories about previous inputs.
A proposal to deal with these memory issues are Long-Short-Term Memory and GRU cells replacing neurons.

Only within recent years, such tuned recurrent neural networks were used as encoders and decoders in the framework of Sequence-To-Sequence architectures.
The encoder part maps an input, e.g. a sequence of words to a fixed-size vector by processing word by word.
The vector output can then be considered as an sentence embedding, which abstractly stores the meaning.
In a second step the decoder maps the abstract vector into an output sequence, by spitting out word by word.
In this way the network architecture is able to respond to an utterance with an response.
Last year this concept was generalized to including a dialog encoder layer on top of the standard encoder.
This might further enhance the architecture to keep track of previous utterances in a full dialog.

The Sequence-To-Sequence architectures as every machine learning system has to undergo a certain training process.
Here, the encoder and the decoder are trained together by presenting corresponding sequence pairs to them.
Optimization methods using back-propagation algorithms can be borrowed from standard neural networks.

As for every deep neural network the amount of available training data is crucial for achieving a good performance.
Interesting data sets for dialog modeling exist as Subtitles of Movie corpora or Scripts for theater plays and TV series.

Latest results show that such architectures are able to model dialogs of the previous form well.
For specific data sets from IT help-desk discussions a given technical problem can sometimes be addressed properly.

With focus on Musio and it’s emotional abilities it is of importance to generate specific datasets that capture proper emotional responses.

Resources
Sequence to Sequence Learning with Neural Networks“(PDF). Sequence to Sequence Learning with Neural Networks. December 2014. Retrieved Feburary 26, 2016.
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation“(PDF). Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. September 2014. Retrieved Feburary 26, 2016.
Building End-To-End Dialogue Systems: Using Generative Hierarchical Neural Network Models“(PDF). Building End-To-End Dialogue Systems: Using Generative Hierarchical Neural Network Models. November 2015. Retrieved Feburary 26, 2016.
Recurrent Continuous Translation Models” (PDF). Recurrent Continuous Translation Models. October 2013. Retrieved Feburary 26, 2016.
Generating Sequences With Recurrent Neural Networks” (PDF). Generating Sequences With Recurrent Neural Networks. June 2014. Retrieved Feburary 26, 2016.
NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE” (PDF). NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE. April 2015. Retrieved Feburary 26, 2016.
Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks” (PDF). Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks. 2006. Retrieved Feburary 26, 2016.
Hierarchical Encoder Decoder for Dialog Modelling” (GIT). Hierarchical Encoder Decoder for Dialog Modelling. Retrieved February 26, 2016.

Leave a Reply