Attention mechanisms in neural networks are a quite new phenomena and we are going to provide some background on them here.
Generally speaking attention mechanisms allow the network to focus only on a certain subset of the data provided for a given task.
Being able to distinguish between the necessary information at a specific step of a task further reduces the amount of information that has to be processed.
recurrent neural networks, convolutional neural networks, encoder, decoder, embedding, weights, memory, reinforcement learning
The idea behind attention mechanisms is certainly motivated by observing the visual attention of humans.
Despite processing the visual input all at the same time, humans rather pay attention to small regions one after the other of for example a picture. This allows to keep the amount of information to be manageable.
In reccurent networks we usually face the problem that we rely on one final embedding of a sentence in order to generate a response sentence. It would be nicer to attend to certain specific words in the input sentence before bringing forth the next word of the output. In particular, machine translation benefits from implementing such attention mechanisms since corresponding words in the in- and output sentence might be at distance.
These correspondences are learned by keeping the embeddings of individual words and attend to them through learning the appropriate weights.
Quit recently the idea of bidirectional LSTMs to generate better word embeddings appeared.
Similar in image recognition tasks, that for example should generate a caption or a sentence by looking at a picture, attention mechanisms to allow to extract the important information from different regions.
Usually, a convolutional encoder generates a hidden representation that is fed to a reccurent decoder providing a description.
Here, an attention mechanisms allows to explicitly see where the algorithm is looking at before generating the next word in the description of an image.
Another example in the field of natural language processing is question answering on a larger text sample.
Again, the weights of the attention mechanism allow to follow the information extraction and track the explicit spots in the text where the algorithm is looking for the answer.
The described attention mechanisms became quite popular recently, because they are easy to implement in any existing architecture. However, a minor criticism about them is related to the additional training time that is needed to learn the attention weights. Instead of allowing to work with less information, the mechanisms has to first look at all the data in order to decide on which region it will focus then. Intuitively, this is not the way human attention works and might be interpreted as storing previous information in a memory that we can later read off. This reveals a close relation of such mechanisms to recent attempts to connect external memory to neural networks. Another direction for attention mechanisms is to introduce some kind of reinforcement learning.
This would mean that an algorithm has to learn through trial and error to focus on certain information without observing the whole picture at a time.
The necessity for such attention mechanisms is provided by high resolution pictures which can not be processed with current convolutional networks simply because of the huge amount of input pixels.
In the future we will certainly see attention mechanisms applied to dialog systems in on or the other form.
“Attention-Based Models for Speech Recognition” (PDF). Attention-Based Models for Speech Recognition. June 2012. Accessed 25 March 2016.
“NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE” (PDF). NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE. April 2015. Accessed 25 March 2016.
“Show, Attend and Tell: Neural Image Caption Generation with Visual Attention” (PDF). Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. February 2015. Accessed 25 March 2016.
“Attention and Memory in Deep Learning and NLP” (WEB). Attention and Memory in Deep Learning and NLP. January 2016. Accessed 25 March 2016.
“ATTENTION MECHANISM” (WEB). ATTENTION MECHANISM. January 2016. Accessed 25 March 2016.