AKA Story

Open Domain Dialogue Dataset Comparison Report

Bach vs. Others This document presents a comparison between curated open-domain dialogue datasets available in the public domain and the data produced by AKA’s Bach data platform. The current report focuses on quantitative measurement which could be done in a transparent manner and represent objective differences found in the data. The analysis was performed using the following criteria: Total Number of Tokens Number of tokens is a measure of the overall size of the dataset. It is very important for training the modern Deep Learning-based models. Bach dataset displays clear superiority to others. Higher is better. Vocabulary Size Vocabulary size is the number of unique tokens appearing in the dataset. It represents the variety of speech in dialogues. Our dataset […]

AKA’s Paper (ReSmart) is accepted by HIMS 2020

AKA’s paper is accepted by International Conference HIMS (Health Informatics and Medical System) //americancse.org/events/csce2020/conferences/hims20 (July, 2020)

Performance Evaluation of Bach’s Retrieval & Scoring System

Overview Most current applications of automated dialogue systems involve narrowly focused language understanding and simple models of dialogue interaction. Understanding language and generating natural dialogue are important in building friendly interfaces for dialogue system, but it is particularly critical in settings where the speaker is focused on 1D situation. Real human conversation is highly context-dependent, and human speakers jointly build contributions to the shared context. That is, human dialogue has a very complex structure by itself, and exhibits a complex network of relations between other dialogues. AKA has continuously tried to build friendly dialogue interfaces, and understand situation- and context-dependent interpretation of speaker utterances, including multiple situations. Bach, multiple linked dialogue data platform for AKA’s dialogue system, is our solution […]

Abusive Language Detection for Muse Engine

Overview Bach, multiple linked dialogue data platform for Muse engine, has utilized multiple resources – artificial intelligence, human reviewers, automated rating system, etc. – in an effort to generate best human-machine conversations, and a noisy data follows as a necessity from the development process. Noisy data is meaningless data, and its meaning can be expanded to include abusive language which causes challenges that we encountered when developing Muse engine. In this blog post, we will describe a development process of Muse engine ‘s abusive language detection system and demonstrate the efficacy by comparing the system with different models in detecting abusive language . To be brief, AKA’s abusive language detection system has shown a good performance by extracting additional features […]