Semantic Composition in Visually Grounded Language Models

Rohan Pandey

arXiv:2305.16328·cs.CL·May 29, 2023·1 cites

Semantic Composition in Visually Grounded Language Models

Rohan Pandey

PDF

Open Access

TL;DR

This paper investigates how visually grounded language models represent compositional semantics, introduces new benchmarks and methods to measure and improve this ability, and explores connections to cognitive sciences.

Contribution

It introduces novel benchmarks, measures, and techniques to evaluate and enhance compositional semantics in vision-language models.

Findings

01

Visual question answering benchmark for compositionality

02

Measures of compositional ability in sentence embeddings

03

Methods to improve vision-language semantic composition

Abstract

What is sentence meaning and its ideal representation? Much of the expressive power of human language derives from semantic composition, the mind's ability to represent meaning hierarchically & relationally over constituents. At the same time, much sentential meaning is outside the text and requires grounding in sensory, motor, and experiential modalities to be adequately learned. Although large language models display considerable compositional ability, recent work shows that visually-grounded language models drastically fail to represent compositional structure. In this thesis, we explore whether & how models compose visually grounded semantics, and how we might improve their ability to do so. Specifically, we introduce 1) WinogroundVQA, a new compositional visual question answering benchmark, 2) Syntactic Neural Module Distillation, a measure of compositional ability in sentence…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling

Methodsfail