AKA Story

Musio’s emotion classifier

Musio’s emotion classifier


In today’s summary we take a look at the emotion classifier applied in Musio and layout some details of the data and models we use.


Sentiment classification is in general an important task, since as humans our intention is never to only convey plain content.
The way we phrase things is as important as the message itself in human interaction.
And sometimes misinterpreting the emotions of one’s counterpart will lead to awkward situations.
Hence, Musio has to learn to read the emotional status of it’s users to take part in their daily life.


emotion, sentiment analysis, MLP, spacy, facial expressions


As every task or problem in machine learning, we started by gathering the appropriate data set for building an emotional system for Musio.
Data sets for sentimental analysis are rather limited to the domain of movie ratings and consequently text fragments are most often only labeled with either positive or negative opinion.
Similar limitations hold also for product reviews and ratings.
That’s why we create our own data set consisting of the following features and labels.
We labeled sentences by one out of 9 emotional states.
These are joy, trust, anticipation, surprise, sadness, fear, disgust, anger and a neutral label.
In addition we came up with three additional features that allow us characterize the pleasure a user experiences, the physiological and psychological state a user is in, also called arousal, and the dominance that is exerted.
To give an example, we have a look at the following sentence.

“Playing games is always fun and relaxing.”

First we associate an emotion with it, say joy in this case.
Then we fix a value for the pleasure between -9 and +9, here maybe 3, for the arousal between 0 and 9, say 5, and for the dominance we pick 6 between 0 and 9.

Before we come to discussing the models, we spent some more time on properly handling the data.
For the encoding part of our language models, we again rely on spacy which generates word vectors and whole sentence vectors for us.
In a next step we normalize the values for pleasure, arousal and dominance by computing the mean and the variance.
As a last step we have to take care of some bias in our data set which is related to the neutral label being quite frequent compared to the other emotional states.

For the task at hand which is to label sentences by an emotion, we use the deep learning library Keras for fast building of our models and easy experimenting.
As our model we choose a Multi-Layer-Perceptron and feed it with sentence vectors created by spacy.

model = Sequential()
model.add(Dense(input_dim=300,output_dim=100, init='uniform'))



We reduce the dimension of the layers towards the end and finally end up with a distribution over the different emotional states.
In between we apply non-linear activation functions in terms of tanh and use dropout to a certain degree.
Instead of using spacy’s given sentence vectors we also experimented with building an encoder model consisting of LSTMs and initiated the word embedding layer using word2vec.

We additionally use a similar model that allows us to infer the pleasure, arousal and dominance values from a given user utterance.
Actually, it is the outcome of this model at inference that goes into the control of Musio’s physiology.
We use the pleasure, arousal and dominance value to coordinate Musio’s facial expressions and to decide on the color of his heart.
Further we intend to adjust the sound of Musio’s voice according to these values.
Finally, Musio’s emotional state changes depending on the classification of the user utterance which then might lead to distinct sentimental responses.


In an upcoming blog post, we are going to take a look at the entity recognition system implemented in Musio and his ability to distinguish gender.

Leave a Reply