Turning Speech Into Scripts
Manny Rayner, Beth Ann Hockey, Frankie James

TL;DR
This paper presents an architecture for converting spoken language into executable scripts for dialogue interfaces, using layered representations and meta-output for handling diverse linguistic and dialogue phenomena.
Contribution
It introduces a novel layered translation process with meta-outputs, enabling effective speech-to-script conversion in dialogue systems.
Findings
Prototype successfully implemented for Personal Satellite Assistant
Effective handling of pronouns, misconceptions, and script optimization
Layered architecture improves clarity and modularity in speech translation
Abstract
We describe an architecture for implementing spoken natural language dialogue interfaces to semi-autonomous systems, in which the central idea is to transform the input speech signal through successive levels of representation corresponding roughly to linguistic knowledge, dialogue knowledge, and domain knowledge. The final representation is an executable program in a simple scripting language equivalent to a subset of Cshell. At each stage of the translation process, an input is transformed into an output, producing as a byproduct a "meta-output" which describes the nature of the transformation performed. We show how consistent use of the output/meta-output distinction permits a simple and perspicuous treatment of apparently diverse topics including resolution of pronouns, correction of user misconceptions, and optimization of scripts. The methods described have been concretely…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Natural Language Processing Techniques · Language, Metaphor, and Cognition
