Reference Resolution and Context Change in Multimodal Situated Dialogue for Exploring Data Visualizations
Abhinav Kumar, Barbara Di Eugenio, Abari Bhattacharya, Jillian, Aurisano, Andrew Johnson

TL;DR
This paper investigates reference resolution in multimodal dialogue involving visualizations, proposing a context-aware pipeline and comparing traditional and deep learning models, highlighting transfer learning benefits and generalization issues.
Contribution
It introduces a new annotation scheme and reference resolution pipeline for multimodal visualization dialogue, and evaluates models including CRF and transformer-based methods.
Findings
Transfer learning improves deep learning model performance.
CRF outperforms deep learning models on low-resource data.
Contextual information enhances reference resolution accuracy.
Abstract
Reference resolution, which aims to identify entities being referred to by a speaker, is more complex in real world settings: new referents may be created by processes the agents engage in and/or be salient only because they belong to the shared physical setting. Our focus is on resolving references to visualizations on a large screen display in multimodal dialogue; crucially, reference resolution is directly involved in the process of creating new visualizations. We describe our annotations for user references to visualizations appearing on a large screen via language and hand gesture and also new entity establishment, which results from executing the user request to create a new visualization. We also describe our reference resolution pipeline which relies on an information-state architecture to maintain dialogue context. We report results on detecting and resolving references,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Topic Modeling · Natural Language Processing Techniques
MethodsConditional Random Field
