Geno: A Developer Tool for Authoring Multimodal Interaction on Existing Web Applications
Ritam Jyoti Sarmah, Yunpeng Ding, Di Wang, Cheuk Yin Phipson Lee, Toby, Jia-Jun Li, Xiang 'Anthony' Chen

TL;DR
Geno is a developer tool that simplifies adding voice command support to existing web applications by providing high-level workflows and multimodal interaction capabilities, reducing the effort and expertise needed.
Contribution
Geno introduces a unified, high-level framework enabling developers to easily add multimodal voice interactions to web apps without extensive NLP knowledge.
Findings
Developers successfully added voice support to web apps using Geno.
Geno reduces the effort and expertise required for multimodal web app development.
The tool supports context-aware voice commands with GUI references.
Abstract
Supporting voice commands in applications presents significant benefits to users. However, adding such support to existing GUI-based web apps is effort-consuming with a high learning barrier, as shown in our formative study, due to the lack of unified support for creating multimodal interfaces. We present Geno---a developer tool for adding the voice input modality to existing web apps without requiring significant NLP expertise. Geno provides a high-level workflow for developers to specify functionalities to be supported by voice (intents), create language models for detecting intents and the relevant information (parameters) from user utterances, and fulfill the intents by either programmatically invoking the corresponding functions or replaying GUI actions on the web app. Geno further supports multimodal references to GUI context in voice commands (e.g. "move this [event] to next…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Natural Language Processing Techniques · Multi-Agent Systems and Negotiation
