Alternatives to the softmax layer
Table of Contents 1. Alternatives to the softmax layer softmax 1.1. goal 1.2. motivation 1.3. ingredients 1.4. steps 1.5. outlook 1.6. resources Alternatives to the softmax layer goal This weeks posts deals with some possible alternatives to the softmax layer when calculating probabilities for words over large vocabularies. motivation Natural language tasks as neural machine translation or dialogue generation rely on word embeddings at the input and output layer. Further for decent performances a very large vocabulary is needed to reduce the number of out of vocabulary words that cannot be properly embedded and therefore not processed. The natural language models used for these task usually come with a final softmax layer to compute the probabilities over the words in the […]