Comparing Trajectory and Vision Modalities for Verb Representation

Dylan Ebert; Chen Sun; Ellie Pavlick

arXiv:2303.12737·cs.CV·March 23, 2023·1 cites

Comparing Trajectory and Vision Modalities for Verb Representation

Dylan Ebert, Chen Sun, Ellie Pavlick

PDF

Open Access

TL;DR

This study compares 2D visual and 3D trajectory modalities for verb representation, revealing that 2D images perform comparably to 3D trajectories in differentiating verb meanings, challenging assumptions about richer data leading to better language models.

Contribution

The paper provides empirical evidence that 2D visual data can be as effective as 3D trajectory data for learning verb semantics in multimodal models.

Findings

01

2D visual modalities perform similarly to 3D trajectories in verb differentiation

02

Challenging the assumption that richer 3D data always improves language representations

03

Results suggest reconsidering the emphasis on 3D data in multimodal NLP models

Abstract

Three-dimensional trajectories, or the 3D position and rotation of objects over time, have been shown to encode key aspects of verb semantics (e.g., the meanings of roll vs. slide). However, most multimodal models in NLP use 2D images as representations of the world. Given the importance of 3D space in formal models of verb semantics, we expect that these 2D images would result in impoverished representations that fail to capture nuanced differences in meaning. This paper tests this hypothesis directly in controlled experiments. We train self-supervised image and trajectory encoders, and then evaluate them on the extent to which each learns to differentiate verb concepts. Contrary to our initial expectations, we find that 2D visual modalities perform similarly well to 3D trajectories. While further work should be conducted on this question, our initial findings challenge the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling

Methodsfail