{"id":1374,"date":"2016-05-20T01:38:39","date_gmt":"2016-05-20T01:38:39","guid":{"rendered":"http:\/\/blog.themusio.com\/?p=1374"},"modified":"2024-05-01T11:03:39","modified_gmt":"2024-05-01T02:03:39","slug":"unsupervised-induction-and-filling-of-semantic-slots-for-spoken-dialogue-systems-using-frame-semantic-parsing","status":"publish","type":"post","link":"https:\/\/blog.themusio.com\/?p=1374","title":{"rendered":"UNSUPERVISED INDUCTION AND FILLING OF SEMANTIC SLOTS FOR SPOKEN DIALOGUE SYSTEMS USING FRAME-SEMANTIC PARSING"},"content":{"rendered":"<p><strong>SUMMARY<\/strong><\/p>\n<p><i>This paper focuses on a single question: \u00a0If given a large amount of unlabelled audio inputs, is it possible to automatically categorize them without supervision by semantically utilizing the frame semantics theory? \u00a0Essentially, after you are left with a large amount of parsed audio samples, how do you group the parsed parts of speech? \u00a0Well, you assign values and clustering rubrics to this data in order to fill those preassigned semantic slots. The authors of the paper are trying to develop the use of a state-of-the-art frame-semantic parser and a spectral clustering based slot ranking model that adapts the generic output of the parser to the target semantic space.<\/i><\/p>\n<p><i>Traditionally, semantic categorization of parsed audio samples has been annotated and compiled by hand according to the frameworks created by developers or domain experts, which is typically pretty time-consuming and expensive. \u00a0Using the method proposed here, predefined slots won\u2019t be needed and development and data collection time would be greatly reduced. \u00a0<\/i><\/p>\n<p><i>To be able to make up for these variations in audio inputs, the authors are aiming to utilize SLU (spoken language understanding) to create connections between these inputs and the semantic representations that best encapsulate a speaker\u2019s intentions.<\/i><\/p>\n<p><i>Using the traditional method of having developers and other professionals manually define the process for domain specific tasks is very expensive and labor intensive. \u00a0And there is another downside to this method. \u00a0Categorizing lexical data in this way can be limiting when it comes to real-world application due to the dynamic nature of the English language. Essentially, they are assigning audio inputs that have been parsed based on probabilities, and then randomly assign those pieces of audio to several series of clusterings based on whatever conversational parameters have been set. The authors want to be able to show that doing it this way, that is, without having someone go in and individually define the semantic slots in which the parses should fall, is more efficient and just as accurate as if it was done by hand. \u00a0They want to be able to show that their random system of parse grouping demonstrate groupings that are just as accurate categorized and clustered according to rubrics and rulesets. To do this, they will compare their automated system of ranking based on parse frequency to some compiled by hand in order to test accuracy.<\/i><\/p>\n<p><i>The authors state that their main goal is \u201cto use a FrameNet-trained statistical probabilistic semantic parser [14] to generate initial frame-semantic parses from automatic speech recognition (ASR) decodings of the raw audio conversation files.\u201d \u00a0But how does one arrange the results of the FrameNet parsing so that they are useful when applied to SDS (spoken dialogue systems)? Going on the assumption that word meanings can more or less be encapsulated in a semantic framework, they can be broken down into three categories: frame, frame elements, and lexical units. \u00a0<\/i><\/p>\n<p><i>SEMAFOR is fairly accurate when it comes to predicting semantic categorization if the inputs are annotated and compiled by hand. \u00a0So, this method is used as a sort of \u201ccontrol,\u201d wherein what is represented by the SEMAFOR semantic parsing into predetermined slots represents the standard by which the authors will judge their results.<\/i><\/p>\n<p><i>The general idea was to parse the inputs into \u201cgeneric\u201d and \u201cdomain-specific\u201d parses. \u00a0Word frequency and \u201ccoherence of values\u201d are two ways in which to measure accuracy when parsing in a domain-specific manner. \u00a0If the values correspond in some way when they are parsed and slotted into the specified framework, then they are seen to be more prominent, and the same goes for word frequency. \u00a0In the end, spectral clustering was used in the authors\u2019 slot-ranking model. \u00a0This was done for three reasons. \u00a0First, \u201cspectral clustering is very easy to implement, and can be solved efficiently by standard linear algebra techniques.\u201d \u00a0It is also \u201cinvariant to the shapes and densities of each cluster.\u201d \u00a0And finally, \u201cspectral clustering projects the manifolds within data into solvable space, and often outperform other clustering approaches.\u201d \u00a0\u00a0<\/i><\/p>\n<p><i>The authors then verified their results in two ways. \u00a0Comparing their results to how parses are slotted into frameworks that have been manually created by domain experts is a way that was discussed earlier on. \u00a0The second is to compare them to hand-parsed samples annotated by actual human beings.<\/i><\/p>\n<p><i>To more accurately determine how related the slot comparisons are in the first test, the quality of the semantic relationships between the two sets of slotted words is assessed. \u00a0So, the authors are essentially matching the parsed results of their spectral clustering to those parses sorted into predetermined semantic slots. \u00a0Of course, there will be some differences, and this task is essentially a measure of the shortest distance between two related words.<\/i><\/p>\n<p><i>It is much easier for the authors to test their results with the second method. \u00a0They simply needed to extract the words from the framework they generated using spectral clustering and compare them to the results of the hand-parsed word lists. \u00a0The results are measured either in a way where only exact word matches are accepted or where at least one word in a slot with multiple parses must match. \u00a0Overall, it was found that matches were frequent and more or less consistently accurate.<\/i><\/p>\n<p><i>In conclusion, the results of the authors\u2019 proposed method of randomly parsing inputs have shown that it is indeed possible to accurately create a framework and slots for parsed samples without the time-consuming task of manually determining said slots and frameworks by hand. \u00a0Using spectral clustering, the authors were able to generate frameworks that matched well with those created by developers and were also able to compare their resulting parses from these randomized frameworks to the results of hand-parsed inputs. \u00a0It is clear that although there were some lulls in word-relation and frequency of matches in some in samples, the authors are clearly making progress toward further automating the development of spoken dialogue systems and spoken language understanding.<\/i><\/p>\n<p>REFERENCES<\/p>\n<div data-canvas-width=\"693.7403306666668\">\n<p>&#8220;<a href=\"https:\/\/www.cs.cmu.edu\/~yww\/papers\/asru2013.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">UNSUPERVISED INDUCTION AND FILLING OF SEMANTIC SLOTS FOR SPOKEN DIALOGUE SYSTEMS USING FRAME-SEMANTIC PARSING<\/a>&#8221; (PDF).<br \/>\n<em>UNSUPERVISED INDUCTION AND FILLING OF SEMANTIC SLOTS FOR SPOKEN DIALOGUE SYSTEMS USING FRAME-SEMANTIC PARSING. <\/em>2013. Accessed 5th May 2016.<em><br \/>\n<\/em><\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>SUMMARY This paper focuses on a single question: \u00a0If given a large amount of unlabelled audio inputs, is it po [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[3642,3640],"tags":[3650,3656,3658,4048,4050,4052,3716,4054,4056,4058,4060,4062,4064],"class_list":["post-1374","post","type-post","status-publish","format-standard","hentry","category-ai-en","category-all-en","tag-ai-ja-en","tag-baggage-en","tag-christmas-en","tag-clustering-en","tag-coherence-of-values-en","tag-hand-parsed-inputs-en","tag-predetermined-semantic-slots-en","tag-semafor-en","tag-slu-en","tag-spectral-clustering-en","tag-spoken-dialogue-systems-en","tag-spoken-language-understanding-en","tag-word-frequency-en"],"aioseo_notices":[],"jetpack_sharing_enabled":true,"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/posts\/1374","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1374"}],"version-history":[{"count":3,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/posts\/1374\/revisions"}],"predecessor-version":[{"id":9258,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/posts\/1374\/revisions\/9258"}],"wp:attachment":[{"href":"https:\/\/blog.themusio.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1374"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1374"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1374"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}