Overview
Bach, multiple linked dialogue data platform for Muse engine, has utilized multiple resources – artificial intelligence, human reviewers, automated rating system, etc. – in an effort to generate best human-machine conversations, and a noisy data follows as a necessity from the development process. Noisy data is meaningless data, and its meaning can be expanded to include abusive language which causes challenges that we encountered when developing Muse engine. In this blog post, we will describe a development process of Muse engine ‘s abusive language detection system and demonstrate the efficacy by comparing the system with different models in detecting abusive language . To be brief, AKA’s abusive language detection system has shown a good performance by extracting additional features such as sentiment words.
Data
Abusive language refers to any type of insult, vulgarity, or profanity that debases the target; it also can be anything that causes aggravation. That is, abusive language can be a bit of a catch-all term. Additionally, detecting abusive language is often more difficult than one expects for a variety of reasons. for instance, a message can be regarded as harmless on its own, but when taking previous threads into account it may be seen as abusive, and vice versa. Also, there are a lot of difficulties still on the table such as sarcasm, difficulty to track all minority insults, intentional obfuscation of words, and so on. In order to reframe the definition of abusive language suitable for our task, we redefined the concept of abusive language that encompasses hate speech , profanity and derogatory language, with reference to the following datasets. To classify these sub categories of abusive language, we chose to experiment with three datasets: Wikipedia Abusive Language (Personal Attack) Data Set, Aggressive Language Identification Dataset : Trolling, Aggression and Cyberbullying (TRAC1), and Crowdflower Twitter Hate Speech Dataset.
Features
In the data pre-processing stage, we combine the three datasets used for this work. The task involves removal of unnecessary columns from the datasets and enumerating the classes. The classes such as toxic, severe toxic, aggressive, sexism, racism and all associated labels are considered as abusive language according to the above-mentioned definition. In the text pre-processing steps, we convert the texts to lowercase and remove the unnecessary contents from the texts. And the input is tokenized into words, and converted into 300-dimensional word embedding vectors using 1 million word vectors trained on Wikipedia using the Fasttext and GloVe classifier. Also, we used the Porter Stemmer algorithm to reduce the inflectional forms of the words. After combining the dataset in proper format, we randomly shuffle and split the dataset into two parts: train dataset containing 70% of the samples and test dataset containing 30% of the samples.- Remove irrelevant characters, Space Pattern, URLs, Stopwords, punctuation marks
- Tokenize words
- Porter stemming; remove the commoner morphological and inflexional endings from words
- Padding; standardize the input length
We extract the n-gram features from the texts and weight them according to their TF-IDF values. The goal of using TF-IDF is to reduce the effect of less informative tokens that appear very frequently in the datasets. Experiments are performed on values of n ranging from one to three. Thus, we can consider unigram, bigram and trigram features. The formula that is used to compute the TF-IDF of term t present in document d is as follows. D is total number of documents in the corpus. We feed these features to both machine learning models and neural network based models.
In an attempt to find the best way of solving abusive language problems, we not only experimented with different pre-processing techniques, but also made an effort to obtain and include additional training data as well as to enrich the given texts with additional meta information. Typical example of extra-linguistic information is sentiment features. The sentiment features are determined using VADER(Valence Aware Dictionary and sEntiment Reasoner) which is able to assign each text a positive and a negative score.