{"id":1605,"date":"2016-07-18T17:38:24","date_gmt":"2016-07-18T08:38:24","guid":{"rendered":"http:\/\/blog.themusio.com\/?p=1605"},"modified":"2024-05-01T10:58:18","modified_gmt":"2024-05-01T01:58:18","slug":"musios-intent-classifier-2","status":"publish","type":"post","link":"https:\/\/blog.themusio.com\/?p=1605","title":{"rendered":"Musio&#8217;s intent classifier"},"content":{"rendered":"<div id=\"table-of-contents\">\n<h2>Table of Contents<\/h2>\n<div id=\"text-table-of-contents\">\n<ul>\n<li><a href=\"#org3302a60\">1. Musio&#8217;s intent classifier&#xa0;&#xa0;&#xa0;<span class=\"tag\"><span class=\"Musio\">Musio<\/span>&#xa0;<span class=\"keras\">keras<\/span>&#xa0;<span class=\"classifier\">classifier<\/span><\/span><\/a>\n<ul>\n<li><a href=\"#org7e9ff41\">1.1. goal<\/a><\/li>\n<li><a href=\"#org1618f0b\">1.2. motivation<\/a><\/li>\n<li><a href=\"#orgb91ab8e\">1.3. ingredients<\/a><\/li>\n<li><a href=\"#orgc9d3789\">1.4. steps<\/a><\/li>\n<li><a href=\"#orgc98e6ae\">1.5. outlook<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<h1>Musio&#8217;s intent classifier     <a id=\"org3302a60\"><\/a><\/h1>\n<h2>goal<a id=\"org7e9ff41\"><\/a><\/h2>\n<p>In today&#8217;s post we will explain by means of presenting an example classifier for the user utterance how Musio is capable of determining the intent of the user.<\/p>\n<h2>motivation<a id=\"org1618f0b\"><\/a><\/h2>\n<p>Since Musio&#8217;s interior consists of several modules, all associated to solving a specific tasks, from question answering to general conversation, we have to forward the user utterance to the correct module that is able to generate a sensible response to the user utterance.<\/p>\n<h2>ingredients<a id=\"orgb91ab8e\"><\/a><\/h2>\n<p>data set, cross validation, modules, classifier, keras, spacy<\/p>\n<h2>steps<a id=\"orgc9d3789\"><\/a><\/h2>\n<p>We skip the part on speech recognition and assume that we received a properly trancscripted user utterance.<br \/>\nThis is where we will begin our work on a classifier that will allow us to pick the correct module that is able to serve the intent of the user.<br \/>\nHere we will restrict ourselves to the case of going through a simplified classifier.<br \/>\nHowever, the results will prove satisfactory and a more enhanced classifier is for certain able to get the job done.<\/p>\n<h3>data<\/h3>\n<p>Before building the actual classifier we have to determine what kind of data it should be able to classify.<br \/>\nIn our case we consider only a small data set with about 50 data points per label.<br \/>\nCompared to other data sets out there for NLP this seems rather small, but it will fit our needs.<br \/>\nLet us now go over the classes, which can be directly associated with appropriate modules in Musio, that the data points are grouped in.<\/p>\n<p><strong>Labels<\/strong><\/p>\n<ul>\n<li>Calculation<\/li>\n<\/ul>\n<p>First, we want Musio to solve simple mathematical calculations, like addition, subtraction and so on.<br \/>\nTherefore Musio should recognize questions, such as <em>&#8220;What is two plus five?&#8221;<\/em> and classify them accordingly.<\/p>\n<ul>\n<li>Unit conversion<\/li>\n<\/ul>\n<p>We further create a set of unit conversion tasks. Musio should answer for example to <em>&#8220;Convert fifty two Fahrenheit to Celsius!&#8221;<\/em>.<\/p>\n<ul>\n<li>Date questions<\/li>\n<\/ul>\n<p>Another set of questions and requests is related to date specific tasks. The answer to the question <em>&#8220;How many days from now is my sister&#8217;s birthday?&#8221;<\/em> might be crucial for deciding on buying her a present in the next days.<\/p>\n<ul>\n<li>Time questions<\/li>\n<\/ul>\n<p>We can further take a closely related task into account which is about telling the time in certain places of the earth: <em>&#8220;How late is it in Munich?&#8221;<\/em>.<\/p>\n<ul>\n<li>Weather questions<\/li>\n<\/ul>\n<p>It might also be relevant to know the weather conditions there and so we consider a set of questions such as <em>&#8220;Will it be sunny this afternoon?.&#8221;<\/em>.<\/p>\n<ul>\n<li>Factoid questions<\/li>\n<\/ul>\n<p>Musio is also capable to answer factoid questions, as <em>&#8220;What is the mass of the earth?&#8221;<\/em>, and to do so these have to get labeled as such.<\/p>\n<ul>\n<li>The rest<\/li>\n<\/ul>\n<p>Finally, we provide a set of utterances that do not fit into these classes and should be treated by the general conversation module.<br \/>\nTo name one example, we can ask Musio for his favorite season of the year.<\/p>\n<p>This makes in total seven classes which we should be able to identify.<\/p>\n<h3>embedding<\/h3>\n<p>Starting from such a data set, there might be a lot of ways to write a successful classifier.<br \/>\nHowever, we will stick to recurrent neural networks since we are dealing with natural language input, in particular utterances of different length.<br \/>\nThis still allows for a wide range of possible implementations.<br \/>\nEven before we decide on a specific architecture we have to answer the question of embedding.<br \/>\nOne choice is to use character embedding and to specify a dictionary to turn every utterance into a sequence of indices in the dictionary.<br \/>\nA clear disadvantage of this approach is the amount of data needed such that the model can come up with proper embeddings at the sentence level.<br \/>\nIn a mitigated form this problem also exists for the word-level approach if we only train the embeddings on our limited data set.<br \/>\nFurther, we have to live with a restricted dictionary and might not be able understand every word.<\/p>\n<p>We decided to give <a href=\"spacy.io\">Spacy<\/a> a try.<br \/>\nSpacy is a NLP library that allows us to tokenize, parse and tag whole sentences.<br \/>\nIn particular, it generates useful word vectors for every word in a sentence.<br \/>\nOther libraries, like gensim are also capable of that, but we will stick with Spacy.<br \/>\nUsing spacy, we can create a reusable dataset class with the following main attributes:<\/p>\n<ul>\n<li><strong>X_all_sent<\/strong>: the raw string of the sentences<\/li>\n<li><strong>X_all_vec_seq<\/strong>: sequences of word vectors for each word in the sentences<\/li>\n<li><strong>X_all_doc_vec<\/strong>: document vectors for the whole sentences<\/li>\n<li><strong>Y_all<\/strong>: labels for the sentences<\/li>\n<\/ul>\n<p>This way of preprocessing might also be useful if we present the data to more enhanced sequence-to-sequence models in NLP tasks.<\/p>\n<pre><code class=\"python\">#In preprocessor.py\nfrom spacy import English\nfrom os import listdir\nfrom os.path import isfile, join\nimport numpy as np\n\nnlp = English()\n\ndata_path = 'data'\nlabels = [f for f in listdir(data_path) if isfile(join(data_path, f))]\n\nclass Dataset(object):\n    def __init__(self):\n        vocab = VocabularyChar()\n        X_all_sent = []\n        X_all_vec_seq = []\n        X_all_doc_vec = []\n        Y_all = []\n        for label in labels:\n            x_file = open('data\/'+label.split('.')[0])\n            x_sents = x_file.read().split('\\n')\n            for x_sent in x_sents:\n                if len(x_sent) &gt; 0:\n                    x_doc = nlp(unicode(x_sent))\n                    x_doc_vec = x_doc.vector\/x_doc.vector_norm\n                    x_vec_seq = []\n                    for word in x_doc:\n                        x_vec_seq.append(word.vector\/word.vector_norm)\n                    x_vec_seq = np.array(x_vec_seq)\n                    X_all_sent.append(x_sent)\n                    X_all_doc_vec.append(x_doc_vec)\n                    X_all_vec_seq.append(x_vec_seq)\n                    Y_all.append(label)\n\n        self.X_all_sent = X_all_sent\n        self.X_all_vec_seq = X_all_vec_seq\n        self.X_all_doc_vec = X_all_doc_vec\n        self.Y_all = Y_all\n<\/code><\/pre>\n<p>Before we dive into the actual model we perform some minor preprocessing by padding sentences to the same length and splitting the data set into training and validation.<\/p>\n<pre><code class=\"python\">#In preprocessor.py\ndef pad_vec_sequences(sequences,maxlen=40):\n    new_sequences = []\n    for sequence in sequences:\n        orig_len, vec_len = np.shape(sequence)\n        if orig_len &lt; maxlen:\n            new = np.zeros((maxlen,vec_len))\n            new[maxlen-orig_len:,:] = sequence\n        else:\n            new = sequence[orig_len-maxlen:,:]\n        new_sequences.append(new)\nreturn np.array(new_sequences)\n<\/code><\/pre>\n<h3>classifier<\/h3>\n<p>Since there are quit a number of popular deep learning libraries out there and our classification task is some kind of standard, we prefer to use Keras for our implementation.<\/p>\n<p>In this example, we will implement a bidirectional LSTM on sequences of word vectors.<\/p>\n<p>Prepare the dataset:<\/p>\n<pre><code class=\"python\">#In train.py\nfrom preprocessor import Dataset, pad_vec_sequences\nfrom sklearn import preprocessing\n\nfrom keras.preprocessing import sequence\nfrom keras.models import Model\nfrom keras.layers import Dense, Dropout, Embedding, LSTM, Input, merge, GRU\nfrom keras.utils import np_utils, generic_utils\n\nds = Dataset()\nX_all = pad_vec_sequences(ds.X_all_vec_seq)\nY_all = ds.Y_all\nx_train, x_test, y_train, y_test = cross_validation.train_test_split(\n                                   X_all,Y_all,test_size=0.2)\ny_train, y_test = [np_utils.to_categorical(x) for x in (y_train, y_test)]\n<\/code><\/pre>\n<p>The model architecture is as follows:<\/p>\n<pre><code class=\"python\">#In train.py\nsequence = Input(shape=(maxlen,300), dtype='float32')\nforwards = LSTM(hidden_dim,dropout_W=0.1,dropout_U=0.1)(sequence)\nbackwards = LSTM(hidden_dim,dropout_W=0.1,dropout_U=0.1,go_backwards=True)(sequence)\nmerged = merge([forwards, backwards], mode='concat', concat_axis=-1)\nafter_dp = Dropout(0.1)(merged)\noutput = Dense(nb_classes, activation='softmax')(after_dp)\nmodel = Model(input=sequence, output=output)\n<\/code><\/pre>\n<p>We start with a simple Input-Layer that handles word vectors (length 300) created by Spacy. In order to generate an utterance embedding we put the sequence through a bidirectional LSTM and some minor dropout.In Keras this actually takes a forward and backward LSTM that are then merged. In the final layer we reduce the dimension to the number of classes in order to perform the classification and produce a probability function over the labels by using the softmax.<br \/>\nThe actual optimization procedure takes only one line:<\/p>\n<pre><code class=\"python\">model.compile('adam', 'categorical_crossentropy')\n````\nHere we chose categorical cross entropy for the loss function and the adam optimizer which is a slightly enhanced version of the stochastic gradient descent.\nNext we trained this classifier over some epochs.\n\n```python\nmodel.fit(x_train, y_train,\n          batch_size=batch_size,\n          nb_epoch=num_epoch,\n          validation_data=[x_test, y_test])\n<\/code><\/pre>\n<p>In the end, even this simplified classifier produced an acceptable classification rate over the labels and we guess that it would stand up to more complex classification tasks with more modules.<\/p>\n<h2>outlook<a id=\"orgc98e6ae\"><\/a><\/h2>\n<p>In an upcoming post we will have look at the emotion classifier, which apart from the intent classifier plays a major role in Musio handling of user utterances and sentimental responding .<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Table of Contents 1. Musio&#8217;s intent classifier&#xa0;&#xa0;&#xa0;Musio&#xa0;keras&#xa0;classifier 1.1. go [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[3640,3644],"tags":[3650,3652,3758,3760,3698,3656,3762,3658,3788,3700,3664,3956,3958,3960,3710,3918,3962],"class_list":["post-1605","post","type-post","status-publish","format-standard","hentry","category-all-en","category-musio-en","tag-ai-ja-en","tag-aka-ja-en","tag-aka-intelligence-en","tag-artificial-intelligence-en","tag-backpropogation-en","tag-baggage-en","tag-children-book-ja-en","tag-christmas-en","tag-classifier-en","tag-cmos-en","tag-crowd-funding-en","tag-data-set-en","tag-keras-en","tag-modules-en","tag-musio-en","tag-spacy-en","tag--en"],"aioseo_notices":[],"jetpack_sharing_enabled":true,"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/posts\/1605","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1605"}],"version-history":[{"count":16,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/posts\/1605\/revisions"}],"predecessor-version":[{"id":10875,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/posts\/1605\/revisions\/10875"}],"wp:attachment":[{"href":"https:\/\/blog.themusio.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1605"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1605"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1605"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}