{"id":2055,"date":"2017-01-26T19:14:18","date_gmt":"2017-01-26T10:14:18","guid":{"rendered":"http:\/\/blog.themusio.com\/?p=2055"},"modified":"2024-05-01T10:53:44","modified_gmt":"2024-05-01T01:53:44","slug":"adversarial-techniques-for-dialogue-generation","status":"publish","type":"post","link":"https:\/\/blog.themusio.com\/?p=2055","title":{"rendered":"Adversarial techniques for dialogue generation"},"content":{"rendered":"<div id=\"table-of-contents\">\n<h2>Table of Contents<\/h2>\n<div id=\"text-table-of-contents\">\n<ul>\n<li><a href=\"#orgd4d2faa\">1. Adversarial techniques for dialogue generation<\/a>\n<ul>\n<li><a href=\"#org34b8e85\">1.1. goal<\/a><\/li>\n<li><a href=\"#org5a36667\">1.2. motivation<\/a><\/li>\n<li><a href=\"#org320e019\">1.3. ingredients<\/a><\/li>\n<li><a href=\"#org7fd41ec\">1.4. steps<\/a><\/li>\n<li><a href=\"#org2deb0d9\">1.5. outlook<\/a><\/li>\n<li><a href=\"#org65c12a8\">1.6. resources<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<h1>Adversarial techniques for dialogue generation<a id=\"orgd4d2faa\"><\/a><\/h1>\n<h2>goal<a id=\"org34b8e85\"><\/a><\/h2>\n<p>This week we are going to have a look at the latest developments of generative adversarial networks (GANs) in the field of dialogue generation by summarizing the paper &#8220;Adversarial Learning for Neural Dialogue Generation&#8221;.<\/p>\n<h2>motivation<a id=\"org5a36667\"><\/a><\/h2>\n<p>General encoder decoder models for response generation are usually not able to produce meaningful utterances and instead come up with short, generic, repetitive and non-informative sequences of words.<br \/>\nThe idea here is to apply adversarial methods so far only successful in computer vision to NLP problems, in particular dialogues.<br \/>\nAdversarial training with respect to modeling conversations can be considered as implementing the Turing test and assigning the role of the evaluator to another neural network, the discriminator.<\/p>\n<h2>ingredients<a id=\"org320e019\"><\/a><\/h2>\n<p>generator, discriminator, reinforce algorithm, beam search, adversarial training, sequence-to-sequence<\/p>\n<h2>steps<a id=\"org7fd41ec\"><\/a><\/h2>\n<p>Adversarial networks have been applied with a lot of success for image generation tasks.<br \/>\nA generator produces an image and tries to fool the discriminator in believing that the image is from the target distribution.<br \/>\nThe discriminator tries to distinguish between generated and target images and gives feedback to the generator.<br \/>\nHowever for NLP one soon discovers that text generation is discrete as opposed to image generation and therefore the network architecture becomes non-differentiable, which complicates the gradient calculation and updating the weights.<br \/>\nTherefore methods from reinforcement learning have to be applied to make the learning in adversarial networks with discrete output generation.<br \/>\nThe response generation policy is given by general sequence-to-sequence model which consists of a encoder mapping an utterance to a distributed vector representation and a decoder which takes this representation as an input to generate a sequence of words known as the response.<br \/>\nThis model can further be furnished with a nowadays popular attention mechanism.<br \/>\nThe discriminator is just a binary classifier with the labels machine- and human-generated which takes as input the response of the generator.<br \/>\nIt can be considered as an analogue to the human evaluator in the Turing test.<br \/>\nOne possible network architecture is the hierarchical encoder model.<br \/>\nThe output of the discriminator then provides an reward and directly gives feedback about the generators quality.<\/p>\n<p>Adversarial learning makes use of policy gradient training known from reinforcement learning.<br \/>\nBasically, we try to maximize the reward for the generated responses.<br \/>\nThe gradients needed to enhance the generator are provided by the adversarial reinforcement algorithm and in particular calculated using the likelihood ratio trick.<br \/>\nKnown problems linked to this calculation are that the expectation of the reward is given by just one example and on top of that the provided gradients for the token generation steps are all identical.<br \/>\nIn other words it is impossible to provide rewards for partially finished sequences with a standard discriminator.<br \/>\nA way to provide rewards for partially finished sequences is to complete these via Monte Carlo search and average the rewards for the completed sequences.<br \/>\nThis method is very time consuming, but the only alternative is to train a partial sequence discriminator.<br \/>\nTo overcoming over fitting on subsequences, only one positive and one negative example are sampled from all subsequences in the training process of the discriminator.<br \/>\nIn this way one is able to give different gradients for every token to the generator.<br \/>\nThe training procedure for the discriminator follows the the standard methods of gradient descent.<br \/>\nOne proposed alternative to overcome the necessity of using the reinforce algorithm is to feed intermediate hidden states to the discriminator.<br \/>\nThis way the network architecture becomes differentiable.<\/p>\n<p>In the training process one notices that even for pretrained sequence-to-sequence models the reward on generated responses is not enough to enable learning.<br \/>\nSince the generator is only indirectly exposed to the actual target response by the reward and it is too difficult to find a good sequence in the huge space of possible responses the generator is not able to learn properly.<br \/>\nTherefore, human-generated responses are fed to the generator with either reward one or the determined reward by the discriminator.<br \/>\nThis allows to regulate the generator in such a way that it does not deviate too much from the training data set.<\/p>\n<p>Finally we have a look at the training details and some tricks to enhance the generation process.<br \/>\nBoth the generator in form of the sequence-to-sequence model with attention and the discriminator are pretrained.<br \/>\nFor the discriminator the negative samples are generated using beam search or randomly picked from human-generated responses to other utterances.<br \/>\nBesides removing short responses from the data set, adjusting the learning rate depending on an average tf-idf score of the words in the response, penalizing repetition of words not in a stop word list and encouraging diversity during beam search enhance the overall generation process.<\/p>\n<h2>outlook<a id=\"org2deb0d9\"><\/a><\/h2>\n<p>Despite being not beneficial to machine translation, adversarial training improves dialogue generation.<br \/>\nThe reason for this might be the difference in the distribution of generated responses and the target distribution.<br \/>\nIt is in general known that adversarial learning is helpful in situations where the generated distribution has high entropy.<br \/>\nIn the future adversarial network architecture will find their way into NLP and provide progress for certain text generation tasks as seen for dialogue generation.<\/p>\n<h2>resources<a id=\"org65c12a8\"><\/a><\/h2>\n<p><a href=\"https:\/\/arxiv.org\/abs\/1701.06547\">https:\/\/arxiv.org\/abs\/1701.06547<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Table of Contents 1. Adversarial techniques for dialogue generation 1.1. goal 1.2. motivation 1.3. ingredients [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[3642,3640],"tags":[3656],"class_list":["post-2055","post","type-post","status-publish","format-standard","hentry","category-ai-en","category-all-en","tag-baggage-en"],"aioseo_notices":[],"jetpack_sharing_enabled":true,"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/posts\/2055","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2055"}],"version-history":[{"count":2,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/posts\/2055\/revisions"}],"predecessor-version":[{"id":10862,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/posts\/2055\/revisions\/10862"}],"wp:attachment":[{"href":"https:\/\/blog.themusio.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2055"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2055"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2055"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}