AKA Story

Memory Neural Networks :MemNN

This summary tries to provide an rough explanation of memory neural networks.
In particular, the we focus on the existing architectures with external memory components.


A lot of task, as the babi tasks require a long-term memory component in order to understand longer passages of text, like stories.
More general, QA tasks demand accessing memories in a wider context, such as past utterances which date back several days or even weeks.


External memory, RNN, LSTM, Embedding model, Scoring function, Softmax, Hops.


Neural Networks in general rely on storing information about training data in the weights of their hidden layers.
However, current architectures, such as RNN and LSTM, limit the access of information seen in the past to several steps.
The idea of memory networks is to provide external memory components in order to store past utterances of a speaker.
In this way, the access to relevant information for coming up with a response is eased.

The mechanism for writing to, updating and reading from the memory is crucial for determining the range of manageable tasks.
In the simplest realization the memory component consists of an input feature map that converts the incoming data into a feature representation.
In a second step the generalization map stores this representation in the next slot of the memory.
More generally, one can think of updating the memory by grouping memories by topic or even forgetting about redundant memories.
The output feature map takes care of retrieving the relevant memories from all stored memories with regard to a certain query, such as a question.
This might involve scoring several times on different memory entries in order to find the appropriate information.
In the final step, the gathered information is transformed into an response, in form of a single word or a complete sentence.

In order to apply standard training methods to this kind of architecture the different memory components have to satisfy certain criteria.
First, writing to and reading from the memory, as well as scoring the relevant memory entries have to be differentiable operations in order to allow gradient descent optimization.
This is achieved by using embedding models, and softmax functions in every step.
Secondly, the scoring mechanism should only preselected memory entries gathered by some kind of hashing, since scoring all entries of a large memory is too time consuming.
A further variable is the number hops that describes how many times we score a next relevant entry depending on previously scored entries.
For short stories a small number of hops might be sufficient to connect the relevant information pieces, however the learning might be too difficult using more hops.

Several extensions to this simple model have been proposed, such as specifying the writing time of memory slots, since this is necessary to model the course of events.
As for every neural network the problem of unseen words, like entity names, has to be treated.
A lot of open questions are also linked to the updating mechanism of the memory, e.g. forgetting redundant information.

Such models have been trained so far on different tasks stretching from the standard babi tasks to QA tasks and recommendations on a movie data base.
Astonishingly the performance does not suffer substantial between the different tasks.
In particular, the memory component allows to learn answering questions on a large amount of triples of word relations.
As for the babi tasks the labeling of supporting sentences is of great importance.
In general, one can say that memory networks outperform the standard LSTM models on tasks where a long term memory component is required.

Interesting related models differing from the described architecture and being capable to address similar tasks are for example knowledge based models, RNN models which use some form of alignment.
At the moment Neural Turing Machines which use a more sophisticated form of interacting with an external memory are tested with regard to simple copying, recalling and sorting tasks.
However, they might become useful in the near future.

Memory Networks” (PDF). Memory Networks. Retrieved March 2, 2016.
MEMORY NETWORKS” (PDF). MEMORY NETWORKS. Nov 2015. Retrieved March 2, 2016.
End-To-End Memory Networks” (PDF). End-To-End Memory Networks. Nov 2015. Retrieved March 2, 2016.
Recurrent Memory Network for Language Modeling” (PDF). Recurrent Memory Network for Language Modeling. Jan 2016. Retrieved March 2, 2016.

Leave a Reply