Dependency Induction Through the Lens of Visual Perception
Ruisi Su, Shruti Rijhwani, Hao Zhu, Junxian He, Xinyu Wang, Yonatan, Bisk, Graham Neubig

TL;DR
This paper introduces an unsupervised model that uses visual and lexical cues to improve grammar induction, significantly enhancing dependency and constituency parsing performance over text-only models.
Contribution
It proposes a novel joint learning approach for constituency and dependency grammars leveraging word concreteness and visual cues, advancing visually grounded syntax models.
Findings
Concreteness improves dependency grammar learning, increasing DAS by over 50%.
Visual semantic role labels enhance constituency parsing accuracy.
The model outperforms existing visually grounded models with smaller grammars.
Abstract
Most previous work on grammar induction focuses on learning phrasal or dependency structure purely from text. However, because the signal provided by text alone is limited, recently introduced visually grounded syntax models make use of multimodal information leading to improved performance in constituency grammar induction. However, as compared to dependency grammars, constituency grammars do not provide a straightforward way to incorporate visual information without enforcing language-specific heuristics. In this paper, we propose an unsupervised grammar induction model that leverages word concreteness and a structural vision-based heuristic to jointly learn constituency-structure and dependency-structure grammars. Our experiments find that concreteness is a strong indicator for learning dependency grammars, improving the direct attachment score (DAS) by over 50\% as compared to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Speech and dialogue systems
