Cognitive Principles in Robust Multimodal Interpretation
J. Y. Chai, Z. Prasov, S. Qu

TL;DR
This paper introduces a cognitively inspired greedy algorithm for interpreting multimodal user references in conversational interfaces, enhancing robustness and efficiency in resolving references across speech and gesture inputs.
Contribution
The paper presents a novel, simple, and general algorithm that incorporates cognitive principles for improved multimodal reference resolution.
Findings
Efficiently resolves a variety of user references
Demonstrates advantages over previous methods in empirical tests
Potential to improve robustness of multimodal interpretation
Abstract
Multimodal conversational interfaces provide a natural means for users to communicate with computer systems through multiple modalities such as speech and gesture. To build effective multimodal interfaces, automated interpretation of user multimodal inputs is important. Inspired by the previous investigation on cognitive status in multimodal human machine interaction, we have developed a greedy algorithm for interpreting user referring expressions (i.e., multimodal reference resolution). This algorithm incorporates the cognitive principles of Conversational Implicature and Givenness Hierarchy and applies constraints from various sources (e.g., temporal, semantic, and contextual) to resolve references. Our empirical results have shown the advantage of this algorithm in efficiently resolving a variety of user references. Because of its simplicity and generality, this approach has the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
