Bach: data architecture for multi-linked dialogues
It is important for recent dialogue systems to learn from human-human conversations in order to generate best human-machine conversations. Normally, the process of dialogue system may be summarized as follows: when a user asks a question, the system either searches a correct response in a set of candidate responses or generates the best response from the candidates. That is, much depends on how to construct massive human-human conversation datasets efficiently and meaningfully to the dialogue system. Our approach has found the answer from Bach.
Bach, one of the main data architectures for Muse, is composed of multiple linked dialogue datasets. The existing systems select from a very small set of candidate responses because of structural weakness handling dialogue context modeling, so they are evaluated on non-realistic scenarios. However, in Bach system, all of dialogue datasets come from real situations, and one dialogue is stored in a tangible form with closely related other dialogues. In this structure, the dialogue can be assumed to be entities or properties for the purpose of knowledge graph. As we can see from the term of knowledge graph, it is not merely store common sense of knowledge but reflects human’s thoughts. The secret is in Bach. In this article, we will look through a part of Bach’s knowledge/network graph how it is possible to consider a lot of dialogue datasets at the same time, and look Bach’s high quality of data through dialogue quality estimation system.
Bach’s wide Coverage & rapid Acquisition
An optimal dialogue system is one which understands various topics. Bach takes on a number of topics for social communications, and rapidly extends its coverage by adding its own dialogue quality estimation system. In particular, Bach has a strong point to learn a domain-specific and domain-independent information from plenty of information about the extent and degree of data quality. Below is a small part of conversation topics covered in Bach for Musio Easy Mode. Each category has detailed topics in the dialogue form of question-and-answer, and the connections between dialogues are the number of different situations on each topic. The coverage of the system is currently limited, but work is in progress in widen conversation topics. And notable is that all dialogues reflect the knowledge, so it is not merely one data but valuable resource for knowledge-based system.
Bach’s Network Graph connecting Dialogues
Of course, there are some issues to make use of the knowledge-based data and network graph with the available memory and within a reasonable amount of time. So, we utilize them with pre-trained embeddings trained from the customized datasets. It makes the size of the dataset reduce significantly, and estimate the speaker’s question with respect to the size, entities, and relation in the entire dialogues. Below figure is a sample of our network graph analyzed by the context maintenance of dialogue. All of dialogues are connected by the extent of context relatedness, and these connections improve the quality of responses by evaluating the model objectively and finding more related dialogues.