{"id":1048,"date":"2016-03-09T10:56:13","date_gmt":"2016-03-09T10:56:13","guid":{"rendered":"http:\/\/blog.themusio.com\/?p=1048"},"modified":"2024-05-01T11:51:32","modified_gmt":"2024-05-01T02:51:32","slug":"memory-neural-networks-memnn","status":"publish","type":"post","link":"https:\/\/blog.themusio.com\/?p=1048","title":{"rendered":"Memory Neural Networks :MemNN"},"content":{"rendered":"<p><strong>Goal<\/strong><br \/>\nThis summary tries to provide an rough explanation of memory neural networks.<br \/>\nIn particular, the we focus on the existing architectures with external memory components.<br \/>\n<strong><br \/>\nMotivation<\/strong><br \/>\nA lot of task, as the babi tasks require a long-term memory component in order to understand longer passages of text, like stories.<br \/>\nMore general, QA tasks demand accessing memories in a wider context, such as past utterances which date back several days or even weeks.<br \/>\n<strong><br \/>\nIngredients<\/strong><br \/>\nExternal memory, RNN, LSTM, Embedding model, Scoring function, Softmax, Hops.<br \/>\n<strong><br \/>\nSteps<\/strong><br \/>\nNeural Networks in general rely on storing information about training data in the weights of their hidden layers.<br \/>\nHowever, current architectures, such as RNN and LSTM, limit the access of information seen in the past to several steps.<br \/>\nThe idea of memory networks is to provide external memory components in order to store past utterances of a speaker.<br \/>\nIn this way, the access to relevant information for coming up with a response is eased.<\/p>\n<p>The mechanism for writing to, updating and reading from the memory is crucial for determining the range of manageable tasks.<br \/>\nIn the simplest realization the memory component consists of an input feature map that converts the incoming data into a feature representation.<br \/>\nIn a second step the generalization map stores this representation in the next slot of the memory.<br \/>\nMore generally, one can think of updating the memory by grouping memories by topic or even forgetting about redundant memories.<br \/>\nThe output feature map takes care of retrieving the relevant memories from all stored memories with regard to a certain query, such as a question.<br \/>\nThis might involve scoring several times on different memory entries in order to find the appropriate information.<br \/>\nIn the final step, the gathered information is transformed into an response, in form of a single word or a complete sentence.<\/p>\n<p>In order to apply standard training methods to this kind of architecture the different memory components have to satisfy certain criteria.<br \/>\nFirst, writing to and reading from the memory, as well as scoring the relevant memory entries have to be differentiable operations in order to allow gradient descent optimization.<br \/>\nThis is achieved by using embedding models, and softmax functions in every step.<br \/>\nSecondly, the scoring mechanism should only preselected memory entries gathered by some kind of hashing, since scoring all entries of a large memory is too time consuming.<br \/>\nA further variable is the number hops that describes how many times we score a next relevant entry depending on previously scored entries.<br \/>\nFor short stories a small number of hops might be sufficient to connect the relevant information pieces, however the learning might be too difficult using more hops.<\/p>\n<p>Several extensions to this simple model have been proposed, such as specifying the writing time of memory slots, since this is necessary to model the course of events.<br \/>\nAs for every neural network the problem of unseen words, like entity names, has to be treated.<br \/>\nA lot of open questions are also linked to the updating mechanism of the memory, e.g. forgetting redundant information.<\/p>\n<p>Such models have been trained so far on different tasks stretching from the standard babi tasks to QA tasks and recommendations on a movie data base.<br \/>\nAstonishingly the performance does not suffer substantial between the different tasks.<br \/>\nIn particular, the memory component allows to learn answering questions on a large amount of triples of word relations.<br \/>\nAs for the babi tasks the labeling of supporting sentences is of great importance.<br \/>\nIn general, one can say that memory networks outperform the standard LSTM models on tasks where a long term memory component is required.<\/p>\n<p>Interesting related models differing from the described architecture and being capable to address similar tasks are for example knowledge based models, RNN models which use some form of alignment.<br \/>\nAt the moment Neural Turing Machines which use a more sophisticated form of interacting with an external memory are tested with regard to simple copying, recalling and sorting tasks.<br \/>\nHowever, they might become useful in the near future.<\/p>\n<p><strong>Resources<\/strong><br \/>\n&#8220;<a href=\"https:\/\/cs224d.stanford.edu\/lectures\/CS224d-Lecture12.pdf\" target=\"_blank\" rel=\"noopener\">Memory Networks<\/a>&#8221; (PDF). <em>Memory Networks.<\/em>\u00a0Retrieved March\u00a02, 2016.<br \/>\n&#8220;<a href=\"http:\/\/arxiv.org\/pdf\/1410.3916v11.pdf\" target=\"_blank\" rel=\"noopener\">MEMORY NETWORKS<\/a>&#8221; (PDF). <em>MEMORY NETWORKS<\/em>. Nov 2015.\u00a0Retrieved March\u00a02, 2016.<br \/>\n&#8220;<a href=\"http:\/\/arxiv.org\/pdf\/1503.08895v5.pdf\" target=\"_blank\" rel=\"noopener\">End-To-End Memory Networks<\/a>&#8221; (PDF). <em>End-To-End Memory Networks<\/em>. Nov 2015.\u00a0Retrieved March\u00a02, 2016.<br \/>\n&#8220;<a href=\"http:\/\/arxiv.org\/pdf\/1601.01272v1.pdf\" target=\"_blank\" rel=\"noopener\">Recurrent Memory Network for Language Modeling<\/a>&#8221; (PDF).\u00a0<em>Recurrent Memory Network for Language Modeling<\/em>. Jan 2016.\u00a0Retrieved March\u00a02, 2016.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Goal This summary tries to provide an rough explanation of memory neural networks. In particular, the we focus [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[3642,3640],"tags":[3650,3652,3760,3656,3762,3658,3700,3664,4270,4272,4274,4276,4278,3852,4280,3710,4020,3862,3876,4282,3878,4284,3884,4286,3886],"class_list":["post-1048","post","type-post","status-publish","format-standard","hentry","category-ai-en","category-all-en","tag-ai-ja-en","tag-aka-ja-en","tag-artificial-intelligence-en","tag-baggage-en","tag-children-book-ja-en","tag-christmas-en","tag-cmos-en","tag-crowd-funding-en","tag-embedding-model-en","tag-external-memory-en","tag-hinoki-en","tag-hops-en","tag-lstm-en","tag-lstm-model-en","tag-memory-neural-networks-en","tag-musio-en","tag-observable-variables-en","tag-oid-sensor-en","tag-out-of-vocabulary-words-en","tag-paper-art-en","tag-parents-en","tag-party-en","tag-rnn-en","tag-scoring-function-en","tag-softmax-en"],"aioseo_notices":[],"jetpack_sharing_enabled":true,"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/posts\/1048","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1048"}],"version-history":[{"count":3,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/posts\/1048\/revisions"}],"predecessor-version":[{"id":10899,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/posts\/1048\/revisions\/10899"}],"wp:attachment":[{"href":"https:\/\/blog.themusio.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1048"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1048"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1048"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}