Meet Musio



Latest Upload

Musio’s intent classifier

Musio’s intent classifier


In today’s post we will explain by means of presenting an example classifier for the user utterance how Musio is capable of determining the intent of the user.


Since Musio’s interior consists of several modules, all associated to solving a specific tasks, from question answering to general conversation, we have to forward the user utterance to the correct module that is able to generate a sensible response to the user utterance.


data set, cross validation, modules, classifier, keras, spacy


We skip the part on speech recognition and assume that we received a properly trancscripted user utterance.
This is where we will begin our work on a classifier that will allow us to pick the correct module that is able to serve the intent of the user.
Here we will restrict ourselves to the case of going through a simplified classifier.
However, the results will prove satisfactory and a more enhanced classifier is for certain able to get the job done.


Before building the actual classifier we have to determine what kind of data it should be able to classify.
In our case we consider only a small data set with about 50 data points per label.
Compared to other data sets out there for NLP this seems rather small, but it will fit our needs.
Let us now go over the classes, which can be directly associated with appropriate modules in Musio, that the data points are grouped in.


  • Calculation

First, we want Musio to solve simple mathematical calculations, like addition, subtraction and so on.
Therefore Musio should recognize questions, such as “What is two plus five?” and classify them accordingly.

  • Unit conversion

We further create a set of unit conversion tasks. Musio should answer for example to “Convert fifty two Fahrenheit to Celsius!”.

  • Date questions

Another set of questions and requests is related to date specific tasks. The answer to the question “How many days from now is my sister’s birthday?” might be crucial for deciding on buying her a present in the next days.

  • Time questions

We can further take a closely related task into account which is about telling the time in certain places of the earth: “How late is it in Munich?”.

  • Weather questions

It might also be relevant to know the weather conditions there and so we consider a set of questions such as “Will it be sunny this afternoon?.”.

  • Factoid questions

Musio is also capable to answer factoid questions, as “What is the mass of the earth?”, and to do so these have to get labeled as such.

  • The rest

Finally, we provide a set of utterances that do not fit into these classes and should be treated by the general conversation module.
To name one example, we can ask Musio for his favorite season of the year.

This makes in total seven classes which we should be able to identify.


Starting from such a data set, there might be a lot of ways to write a successful classifier.
However, we will stick to recurrent neural networks since we are dealing with natural language input, in particular utterances of different length.
This still allows for a wide range of possible implementations.
Even before we decide on a specific architecture we have to answer the question of embedding.
One choice is to use character embedding and to specify a dictionary to turn every utterance into a sequence of indices in the dictionary.
A clear disadvantage of this approach is the amount of data needed such that the model can come up with proper embeddings at the sentence level.
In a mitigated form this problem also exists for the word-level approach if we only train the embeddings on our limited data set.
Further, we have to live with a restricted dictionary and might not be able understand every word.

We decided to give Spacy a try.
Spacy is a NLP library that allows us to tokenize, parse and tag whole sentences.
In particular, it generates useful word vectors for every word in a sentence.
Other libraries, like gensim are also capable of that, but we will stick with Spacy.
Using spacy, we can create a reusable dataset class with the following main attributes:

  • X_all_sent: the raw string of the sentences
  • X_all_vec_seq: sequences of word vectors for each word in the sentences
  • X_all_doc_vec: document vectors for the whole sentences
  • Y_all: labels for the sentences

This way of preprocessing might also be useful if we present the data to more enhanced sequence-to-sequence models in NLP tasks.

from spacy import English
from os import listdir
from os.path import isfile, join
import numpy as np

nlp = English()

data_path = 'data'
labels = [f for f in listdir(data_path) if isfile(join(data_path, f))]

class Dataset(object):
    def __init__(self):
        vocab = VocabularyChar()
        X_all_sent = []
        X_all_vec_seq = []
        X_all_doc_vec = []
        Y_all = []
        for label in labels:
            x_file = open('data/'+label.split('.')[0])
            x_sents ='\n')
            for x_sent in x_sents:
                if len(x_sent) > 0:
                    x_doc = nlp(unicode(x_sent))
                    x_doc_vec = x_doc.vector/x_doc.vector_norm
                    x_vec_seq = []
                    for word in x_doc:
                    x_vec_seq = np.array(x_vec_seq)

        self.X_all_sent = X_all_sent
        self.X_all_vec_seq = X_all_vec_seq
        self.X_all_doc_vec = X_all_doc_vec
        self.Y_all = Y_all

Before we dive into the actual model we perform some minor preprocessing by padding sentences to the same length and splitting the data set into training and validation.

def pad_vec_sequences(sequences,maxlen=40):
    new_sequences = []
    for sequence in sequences:
        orig_len, vec_len = np.shape(sequence)
        if orig_len < maxlen:
            new = np.zeros((maxlen,vec_len))
            new[maxlen-orig_len:,:] = sequence
            new = sequence[orig_len-maxlen:,:]
return np.array(new_sequences)


Since there are quit a number of popular deep learning libraries out there and our classification task is some kind of standard, we prefer to use Keras for our implementation.

In this example, we will implement a bidirectional LSTM on sequences of word vectors.

Prepare the dataset:

from preprocessor import Dataset, pad_vec_sequences
from sklearn import preprocessing

from keras.preprocessing import sequence
from keras.models import Model
from keras.layers import Dense, Dropout, Embedding, LSTM, Input, merge, GRU
from keras.utils import np_utils, generic_utils

ds = Dataset()
X_all = pad_vec_sequences(ds.X_all_vec_seq)
Y_all = ds.Y_all
x_train, x_test, y_train, y_test = cross_validation.train_test_split(
y_train, y_test = [np_utils.to_categorical(x) for x in (y_train, y_test)]

The model architecture is as follows:

sequence = Input(shape=(maxlen,300), dtype='float32')
forwards = LSTM(hidden_dim,dropout_W=0.1,dropout_U=0.1)(sequence)
backwards = LSTM(hidden_dim,dropout_W=0.1,dropout_U=0.1,go_backwards=True)(sequence)
merged = merge([forwards, backwards], mode='concat', concat_axis=-1)
after_dp = Dropout(0.1)(merged)
output = Dense(nb_classes, activation='softmax')(after_dp)
model = Model(input=sequence, output=output)

We start with a simple Input-Layer that handles word vectors (length 300) created by Spacy. In order to generate an utterance embedding we put the sequence through a bidirectional LSTM and some minor dropout.In Keras this actually takes a forward and backward LSTM that are then merged. In the final layer we reduce the dimension to the number of classes in order to perform the classification and produce a probability function over the labels by using the softmax.
The actual optimization procedure takes only one line:

model.compile('adam', 'categorical_crossentropy')
Here we chose categorical cross entropy for the loss function and the adam optimizer which is a slightly enhanced version of the stochastic gradient descent.
Next we trained this classifier over some epochs.

```python, y_train,
          validation_data=[x_test, y_test])

In the end, even this simplified classifier produced an acceptable classification rate over the labels and we guess that it would stand up to more complex classification tasks with more modules.


In an upcoming post we will have look at the emotion classifier, which apart from the intent classifier plays a major role in Musio handling of user utterances and sentimental responding .

Leave a Reply