{"id":1250,"date":"2016-04-06T09:27:07","date_gmt":"2016-04-06T09:27:07","guid":{"rendered":"http:\/\/blog.themusio.com\/?p=1250"},"modified":"2024-05-01T11:06:54","modified_gmt":"2024-05-01T02:06:54","slug":"a-method-for-measuring-sentence-similarity-and-its-application-to-conversational-agents","status":"publish","type":"post","link":"https:\/\/blog.themusio.com\/?p=1250","title":{"rendered":"A method for measuring  sentence similarity and its application to Conversational Agents"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">In this paper, semantic information and word order are used in order to determine sentence similarity. \u00a0In order to do that, the authors introduce their proposed algorithm which takes into account our common, shared lexical corpus and the frequency of words that are similar across sentences. \u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The analyzation of text symmetry has been used on long texts to measure word similarity in the past. \u00a0For the most part it was successful because the texts which were being analyzed had a higher frequency of return regarding word similarity due to their length. \u00a0On the other hand, shorter texts were harder to analyze in this way because there were a limited number of words to work with, ultimately meaning a smaller chance for success. \u00a0Here, the authors are attempting to a develop an algorithm that can analyze these short texts differently than traditional methods in two ways. \u00a0First, it only analyzes text at a sentence-to-sentence level rather than as a whole like in long form documents. Second, they are integrating word-order data to better detect sentence similarity. In long texts, it is easier to identify the way words and their overall order within a text bring it meaning and hold information. \u00a0In short texts, the task is a little bit more difficult. Essentially, \u201cThe task is to establish a computational method that is able to measure the similarity between very short texts (sentences).\u201d <\/span><\/p>\n<p><span style=\"font-weight: 400;\">Traditionally, \u201cinformation retrieval methods used a set of a pre-determined index terms (words or collocations) that are used to represent a document in the form of a document-term vector.\u201d \u00a0This is problematic, however, because the vector will have a very small number of nonzero points when used on short texts. It also can cause certain keywords or phrases to be missed, or even overall sentence relevance, because the data limits of the term set are set too broadly. \u00a0\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">First, it is important to examine word similarity. \u00a0Using Miller\u2019s WordNet, word similarity is measured by the shortest distance traveled in a hierarchical architecture of synonyms and semantic meaning. \u00a0A formula proposed by the authors sums up this process by essentially describing a way to determine the shortest relational distance between two words, taking into account the depth to which the words begin to share meaning in the hierarchy. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">Next, it is easy for humans to understand sentence similarity. \u00a0\u201cThe quick brown fox jumps over the lazy dog\u201d and \u201cThe quick brown dog jumps over the lazy fox\u201d both present similar information but the subject and object in each sentence reverse places. \u00a0It is easy for us to make that distinction in natural conversation, but translating this into a computational language for the use of natural language processing is no small task. \u00a0In these cases, word order similarity defines sentence similarity, and this is what the authors intend to measure. In the example provided in this paragraph, both sentences are similar except for the fact that the 4th and 9th words in each are switched. \u00a0So, the author\u2019s solution to this is to assign each word in a sentence a unique numerical value, and upon comparison, if the same values are shown to exist in the same places across different sentences, similarity is appropriately measured. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">The authors state that \u201cSemantic similarity represents the lexical similarity. On the other hand, word order similarity provides information about the relationship between words: which words appear in the sentence, and which words come before or after which other words.\u201d \u00a0So, both semantic and syntactic similarity have bearing on sentence similarity. \u00a0When words share similarities in meaning, we can begin to map sentence similarity. \u00a0Word order and syntax have bearing on word similarity as well, which further contributes to accurately determining whether sentences share similarities. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">Now that the authors have determined a way to assess levels of sentence similarity, they need to be able to implement it so it can be made use of by conversational agents. \u00a0Since the research here is going to be used to help machines be able to process input and communicate using language in a more human manner, the implementation of this data must adhere to pattern-based rules (because conversational agents rely on stimulus patterns and response patterns to carry on dialogue). \u00a0So, in order to complete a dataset that is useful to natural language processing, every possible stimulus and response must be logged and archived if there is to be any semblance of natural dialogue. \u00a0Since language is so dynamic, these datasets tend to be long. \u00a0However, using the authors\u2019 algorithm to determine sentence similarity, one could foreseeably restrict the length of these datasets. \u00a0For instance, when a set of possible stimulus patterns is considered, there could be several words pertaining to <\/span><i><span style=\"font-weight: 400;\">child <\/span><\/i><span style=\"font-weight: 400;\">(<\/span><i><span style=\"font-weight: 400;\">kid<\/span><\/i><span style=\"font-weight: 400;\">, <\/span><i><span style=\"font-weight: 400;\">boy<\/span><\/i><span style=\"font-weight: 400;\">, <\/span><i><span style=\"font-weight: 400;\">girl<\/span><\/i><span style=\"font-weight: 400;\">, etc.), and a response is built around one or more of these words individually, thus adding similar response possibilities to the dataset. \u00a0However, using word similarity vectors, the dataset could be shortened by combining possible responses that contain words that are deemed to fall within the limits of whatever semantic similarity the words are seen to share (i.e. there don\u2019t need to be four separate data entries for <\/span><i><span style=\"font-weight: 400;\">child, kid, boy, girl<\/span><\/i><span style=\"font-weight: 400;\">). This eliminates sentences that only show small differences in words and not overall meaning, allowing for the dataset of possible responses to be shorter and to elicit a faster response time from a conversational agent. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">So, pattern matching might not be anything new in natural language processing, but comparing sentence similarity has opened the doors to some new and potentially groundbreaking advancements in the use of dialogue agents. \u00a0The rules and datasets are drastically shorter, and that makes them infinitely more readable and easier to maintain and update as developments are made. \u00a0Essentially, this research was done in order to simplify a conversational agent\u2019s representation of knowledge (potential stimuli and responses) and how it processes that knowledge. \u00a0This ultimately leads to the entire process being shortened, making it more efficient when natural conversation relies on the quickness with which dialogue agents can assess input, measure the appropriateness of a response, and issue that response. <\/span><\/p>\n<p><strong>Resources<\/strong><\/p>\n<p><span style=\"font-weight: 400;\">Li, Yuhua, Zuhair Bandar, David McLean, and James O&#8217;Shea. &#8220;<a href=\"http:\/\/www.aaai.org\/Papers\/FLAIRS\/2004\/Flairs04-139.pdf\" target=\"_blank\" rel=\"noopener\">A Method for Measuring <\/a><\/span><span style=\"font-weight: 400;\"><a href=\"http:\/\/www.aaai.org\/Papers\/FLAIRS\/2004\/Flairs04-139.pdf\" target=\"_blank\" rel=\"noopener\">Sentence Similarity and Its Application to Conversational Agents.<\/a>&#8220;<\/span><i><span style=\"font-weight: 400;\">PROCEEDINGS OF THE SEVENTEENTH INTERNATIONAL FLORIDA ARTIFICIAL INTELLIGENCE RESEARCH SOCIETY CONFERENCE<\/span><\/i><span style=\"font-weight: 400;\"> (2004). <\/span><i><span style=\"font-weight: 400;\">AAAI<\/span><\/i><span style=\"font-weight: 400;\">. Web.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this paper, semantic information and word order are used in order to determine sentence similarity. \u00a0In ord [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[3642,3640],"tags":[3650,3652,3760,3656,3762,3658,3788,3700,3790,3664,4126,3816,3710,4128],"class_list":["post-1250","post","type-post","status-publish","format-standard","hentry","category-ai-en","category-all-en","tag-ai-ja-en","tag-aka-ja-en","tag-artificial-intelligence-en","tag-baggage-en","tag-children-book-ja-en","tag-christmas-en","tag-classifier-en","tag-cmos-en","tag-conversational-agents-en","tag-crowd-funding-en","tag-dialogue-en","tag-language-en","tag-musio-en","tag-wordnet-en"],"aioseo_notices":[],"jetpack_sharing_enabled":true,"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/posts\/1250","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1250"}],"version-history":[{"count":3,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/posts\/1250\/revisions"}],"predecessor-version":[{"id":10889,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/posts\/1250\/revisions\/10889"}],"wp:attachment":[{"href":"https:\/\/blog.themusio.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1250"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1250"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1250"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}