Can Language Models Understand Physical Concepts?

Lei Li; Jingjing Xu; Qingxiu Dong; Ce Zheng; Qi Liu; Lingpeng Kong; Xu; Sun

arXiv:2305.14057·cs.CL·May 24, 2023·2 cites

Can Language Models Understand Physical Concepts?

Lei Li, Jingjing Xu, Qingxiu Dong, Ce Zheng, Qi Liu, Lingpeng Kong, Xu, Sun

PDF

Open Access 1 Repo

TL;DR

This paper investigates whether language models can understand physical concepts, introducing a benchmark VEC, analyzing their performance, and proposing a method to transfer embodied knowledge from vision-language models to improve understanding.

Contribution

The paper introduces the VEC benchmark for physical concept understanding, analyzes LM performance, and proposes a distillation method to transfer embodied knowledge from vision-language models.

Findings

01

Scaling up LMs improves understanding of some visual concepts.

02

Vision-augmented LMs like CLIP and BLIP achieve human-level understanding of embodied concepts.

03

A distillation method transfers embodied knowledge, boosting LM performance significantly.

Abstract

Language models~(LMs) gradually become general-purpose interfaces in the interactive and embodied world, where the understanding of physical concepts is an essential prerequisite. However, it is not yet clear whether LMs can understand physical concepts in the human world. To investigate this, we design a benchmark VEC that covers the tasks of (i) Visual concepts, such as the shape and material of objects, and (ii) Embodied Concepts, learned from the interaction with the world such as the temperature of objects. Our zero (few)-shot prompting results show that the understanding of certain visual concepts emerges as scaling up LMs, but there are still basic concepts to which the scaling law does not apply. For example, OPT-175B performs close to humans with a zero-shot accuracy of 85\% on the material concept, yet behaves like random guessing on the mass concept. Instead, vision-augmented…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tobiaslee/vec
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques

MethodsContrastive Language-Image Pre-training · BLIP: Bootstrapping Language-Image Pre-training