Pretraining on Interactions for Learning Grounded Affordance Representations
Jack Merullo, Dylan Ebert, Carsten Eickhoff, Ellie Pavlick

TL;DR
This paper demonstrates that training neural networks on simulated object interactions enables the learning of grounded affordance representations, bridging the gap between cognitive science concepts and modern language models.
Contribution
It introduces a novel approach of using 3D simulation-based training to learn grounded affordance representations, integrating cognitive science with deep learning.
Findings
Models trained on 3D simulations outperform 2D models.
Latent representations differentiate observed and unobserved affordances.
Concept features align with expected semantic properties.
Abstract
Lexical semantics and cognitive science point to affordances (i.e. the actions that objects support) as critical for understanding and representing nouns and verbs. However, study of these semantic features has not yet been integrated with the "foundation" models that currently dominate language representation research. We hypothesize that predictive modeling of object state over time will result in representations that encode object affordance information "for free". We train a neural network to predict objects' trajectories in a simulated interaction and show that our network's latent representations differentiate between both observed and unobserved affordances. We find that models trained using 3D simulations from our SPATIAL dataset outperform conventional 2D computer vision models trained on a similar task, and, on initial inspection, that differences between concepts correspond…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Language and cultural evolution
