Things not Written in Text: Exploring Spatial Commonsense from Visual   Signals

Xiao Liu; Da Yin; Yansong Feng; Dongyan Zhao

arXiv:2203.08075·cs.CL·April 28, 2022·1 cites

Things not Written in Text: Exploring Spatial Commonsense from Visual Signals

Xiao Liu, Da Yin, Yansong Feng, Dongyan Zhao

PDF

Open Access 1 Repo

TL;DR

This paper investigates whether visual signals improve spatial commonsense reasoning in AI models, finding that image synthesis models outperform language models in learning and applying spatial knowledge.

Contribution

The study introduces a new spatial commonsense benchmark and demonstrates that image synthesis models better learn spatial relationships than text-based models.

Findings

01

Image synthesis models outperform PLMs in spatial reasoning.

02

Spatial knowledge from image models aids natural language understanding.

03

Proposed benchmark effectively evaluates spatial commonsense.

Abstract

Spatial commonsense, the knowledge about spatial position and relationship between objects (like the relative size of a lion and a girl, and the position of a boy relative to a bicycle when cycling), is an important part of commonsense knowledge. Although pretrained language models (PLMs) succeed in many NLP tasks, they are shown to be ineffective in spatial commonsense reasoning. Starting from the observation that images are more likely to exhibit spatial commonsense than texts, we explore whether models with visual signals learn more spatial commonsense than text-based PLMs. We propose a spatial commonsense benchmark that focuses on the relative scales of objects, and the positional relationship between people and objects under different actions. We probe PLMs and models with visual signals, including vision-language pretrained models and image synthesis models, on this benchmark, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xxxiaol/spatial-commonsense
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Speech and dialogue systems