Summary of Jurafsky-Martin (Chapter 19)
This paper discusses the elementary structures of spoken dialogue systems (also known as spoken language systems or conversational agents). These systems are typically programs that allow people to communicate with machines in natural spoken language in order to accomplish a variety of tasks from booking flights on automated phone systems to having conversations about things like sports and weather. Research on these systems depends heavily on our understanding of human conversation and dialogue practices.
In order to create a spoken dialogue system that is efficient, natural, and has a high task success rate, there are quite a few things to accomplish. There are five elements to consider: automatic speech recognition (ASR), natural language understanding (NLU), dialogue management, natural language generation, and speech synthesis. ASR interprets audio input and formulates responses according to whatever language model it has been programmed with. NLU, relying generally on semantic grammar, is a portion of a spoken dialogue system where information is parsed according to its appropriateness in regards to the conversation topic.
Dialogue management essentially dictates the structure of the conversation/dialogue when considering how to respond to input. Natural language generation creates a response by weighing user inputs and valuing these in conjunction with its own responses to maintain conversational appropriateness. Finally, speech synthesis maps the connection between semantic meanings and how they can be translated into speech to either further prompt the user or respond with relevant speech.
Now, while this list of five components seems relatively straightforward, there are a variety of factors in human speech and conversation that greatly complicate them. To cope with these variations, a few different models of dialogue management have been created. Finite-state and frame-based managers function very well for certain tasks, but not so well for others (think about the automated telephone system mentioned earlier). Other advances allow for more sophisticated conversational components to be considered Advanced frameworks like the Markov Decision Process and Belief-Desire-Intention models aim to incorporate more nuances of human conversation to create a more natural and conversational feel to dialogue.
Ultimately, the nuances of human conversation, things such as conversational pauses, implication, grounding, prosody, etc., must be integrated into spoken dialogue systems more effectively. What is often overlooked in human-to-human conversation is really the crux of the developments outlined in this paper. Certain models may work well in isolated conversational situations, but as dialogue increases in complexity, so must the models and architectures implemented in order to effectively generate natural conversation between humans and machines.
Jurafsky, Daniel, and James H. Martin. “Chapter 19 Dialogue and Conversational Agents.” Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Upper Saddle River, New Jersey: Prentice Hall, 2009. Print.
Leave a Reply