AKA Story

Covering rare words

Table of Contents 1. Covering rare words 1.1. goal 1.2. motivation 1.3. ingredients 1.4. steps 1.5. outlook 1.6. resources Covering rare words goal This week’s blogpost treats a new network architecture, named pointer models, for taking care of rare words. We will dive into some details of the implementation and give a short analysis of the benefits of these kind of models. motivation Motivation for the introduction of new architectures comes directly from short-comings of RNN language models, as well as encoder decoder frameworks. Rare words, especially named entities do not experience good word embeddings and hence do not lead to appropriate sentence embeddings which might be used to initialize a decoder component for predicting an output sequence. Furthermore, the […]

Alternatives to the softmax layer

Table of Contents 1. Alternatives to the softmax layer   softmax 1.1. goal 1.2. motivation 1.3. ingredients 1.4. steps 1.5. outlook 1.6. resources Alternatives to the softmax layer goal This weeks posts deals with some possible alternatives to the softmax layer when calculating probabilities for words over large vocabularies. motivation Natural language tasks as neural machine translation or dialogue generation rely on word embeddings at the input and output layer. Further for decent performances a very large vocabulary is needed to reduce the number of out of vocabulary words that cannot be properly embedded and therefore not processed. The natural language models used for these task usually come with a final softmax layer to compute the probabilities over the words in the […]

Memory Neural Networks :MemNN

Goal This summary tries to provide an rough explanation of memory neural networks. In particular, the we focus on the existing architectures with external memory components. Motivation A lot of task, as the babi tasks require a long-term memory component in order to understand longer passages of text, like stories. More general, QA tasks demand accessing memories in a wider context, such as past utterances which date back several days or even weeks. Ingredients External memory, RNN, LSTM, Embedding model, Scoring function, Softmax, Hops. Steps Neural Networks in general rely on storing information about training data in the weights of their hidden layers. However, current architectures, such as RNN and LSTM, limit the access of information seen in the past […]