A Linguistic Analysis of Visually Grounded Dialogues Based on Spatial Expressions
Takuma Udagawa, Takato Yamazaki, Akiko Aizawa

TL;DR
This paper introduces a detailed linguistic analysis framework for visually grounded dialogues, focusing on spatial expressions within the OneCommon Corpus to evaluate model understanding of linguistic structures.
Contribution
It provides a new annotated dataset and framework for analyzing spatial expressions in dialogues, revealing models' strengths and weaknesses in linguistic comprehension.
Findings
Annotation captures predicate-argument structures, modification, and ellipsis.
Analysis reveals models' ability to recognize spatial expressions.
Framework aids in diagnosing linguistic understanding in visual dialogue models.
Abstract
Recent models achieve promising results in visually grounded dialogues. However, existing datasets often contain undesirable biases and lack sophisticated linguistic analyses, which make it difficult to understand how well current models recognize their precise linguistic structures. To address this problem, we make two design choices: first, we focus on OneCommon Corpus \citep{udagawa2019natural,udagawa2020annotated}, a simple yet challenging common grounding dataset which contains minimal bias by design. Second, we analyze their linguistic structures based on \textit{spatial expressions} and provide comprehensive and reliable annotation for 600 dialogues. We show that our annotation captures important linguistic structures including predicate-argument structure, modification and ellipsis. In our experiments, we assess the model's understanding of these structures through reference…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Language, Metaphor, and Cognition · Natural Language Processing Techniques
