Measuring and Narrowing the Compositionality Gap in Language Models

Ofir Press; Muru Zhang; Sewon Min; Ludwig Schmidt; Noah A. Smith; Mike; Lewis

arXiv:2210.03350·cs.CL·October 19, 2023·23 cites

Measuring and Narrowing the Compositionality Gap in Language Models

Ofir Press, Muru Zhang, Sewon Min, Ludwig Schmidt, Noah A. Smith, Mike, Lewis

PDF

Open Access 1 Repo 3 Datasets

TL;DR

This paper examines the limitations of large language models in compositional reasoning, introduces methods like self-ask and structured prompting to reduce the compositionality gap, and demonstrates improved reasoning performance.

Contribution

It identifies the persistent compositionality gap in language models and proposes novel prompting techniques, including self-ask, to enhance their reasoning capabilities.

Findings

01

Larger models improve factual recall faster than compositional reasoning.

02

Elicitive prompting methods like chain of thought reduce the compositionality gap.

03

Self-ask further improves reasoning by explicitly asking follow-up questions.

Abstract

We investigate the ability of language models to perform compositional reasoning tasks where the overall solution depends on correctly composing the answers to sub-problems. We measure how often models can correctly answer all sub-problems but not generate the overall solution, a ratio we call the compositionality gap. We evaluate this ratio by asking multi-hop questions with answers that require composing multiple facts unlikely to have been observed together during pretraining. In the GPT-3 family of models, as model size increases we show that the single-hop question answering performance improves faster than the multi-hop performance does, therefore the compositionality gap does not decrease. This surprising result suggests that while more powerful models memorize and recall more factual knowledge, they show no corresponding improvement in their ability to perform this kind of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ofirpress/self-ask
noneOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Software Engineering Research

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Layer Normalization · Cosine Annealing · Residual Connection · Dropout · Weight Decay · Linear Warmup With Cosine Annealing