A Formal Analysis of Multimodal Referring Strategies Under Common Ground
Nikhil Krishnaswamy, James Pustejovsky

TL;DR
This paper analyzes how gesture and language combine in referring expressions, revealing formal semantic properties that can improve models predicting viewer judgments and generating natural references.
Contribution
It provides a formal semantic analysis of multimodal referring expressions involving gesture and language, highlighting their interaction within common ground.
Findings
Formal properties of gesture-language interactions identified
Insights for training models to predict viewer judgments
Potential improvements in generating natural referring expressions
Abstract
In this paper, we present an analysis of computationally generated mixed-modality definite referring expressions using combinations of gesture and linguistic descriptions. In doing so, we expose some striking formal semantic properties of the interactions between gesture and language, conditioned on the introduction of content into the common ground between the (computational) speaker and (human) viewer, and demonstrate how these formal features can contribute to training better models to predict viewer judgment of referring expressions, and potentially to the generation of more natural and informative referring expressions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Multi-Agent Systems and Negotiation · Natural Language Processing Techniques
