Comparing Trajectory and Vision Modalities for Verb Representation
Dylan Ebert, Chen Sun, Ellie Pavlick

TL;DR
This study compares 2D visual and 3D trajectory modalities for verb representation, revealing that 2D images perform comparably to 3D trajectories in differentiating verb meanings, challenging assumptions about richer data leading to better language models.
Contribution
The paper provides empirical evidence that 2D visual data can be as effective as 3D trajectory data for learning verb semantics in multimodal models.
Findings
2D visual modalities perform similarly to 3D trajectories in verb differentiation
Challenging the assumption that richer 3D data always improves language representations
Results suggest reconsidering the emphasis on 3D data in multimodal NLP models
Abstract
Three-dimensional trajectories, or the 3D position and rotation of objects over time, have been shown to encode key aspects of verb semantics (e.g., the meanings of roll vs. slide). However, most multimodal models in NLP use 2D images as representations of the world. Given the importance of 3D space in formal models of verb semantics, we expect that these 2D images would result in impoverished representations that fail to capture nuanced differences in meaning. This paper tests this hypothesis directly in controlled experiments. We train self-supervised image and trajectory encoders, and then evaluate them on the extent to which each learns to differentiate verb concepts. Contrary to our initial expectations, we find that 2D visual modalities perform similarly well to 3D trajectories. While further work should be conducted on this question, our initial findings challenge the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
Methodsfail
