Measuring and Narrowing the Compositionality Gap in Language Models
Ofir Press, Muru Zhang, Sewon Min, Ludwig Schmidt, Noah A. Smith, Mike, Lewis

TL;DR
This paper examines the limitations of large language models in compositional reasoning, introduces methods like self-ask and structured prompting to reduce the compositionality gap, and demonstrates improved reasoning performance.
Contribution
It identifies the persistent compositionality gap in language models and proposes novel prompting techniques, including self-ask, to enhance their reasoning capabilities.
Findings
Larger models improve factual recall faster than compositional reasoning.
Elicitive prompting methods like chain of thought reduce the compositionality gap.
Self-ask further improves reasoning by explicitly asking follow-up questions.
Abstract
We investigate the ability of language models to perform compositional reasoning tasks where the overall solution depends on correctly composing the answers to sub-problems. We measure how often models can correctly answer all sub-problems but not generate the overall solution, a ratio we call the compositionality gap. We evaluate this ratio by asking multi-hop questions with answers that require composing multiple facts unlikely to have been observed together during pretraining. In the GPT-3 family of models, as model size increases we show that the single-hop question answering performance improves faster than the multi-hop performance does, therefore the compositionality gap does not decrease. This surprising result suggests that while more powerful models memorize and recall more factual knowledge, they show no corresponding improvement in their ability to perform this kind of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Software Engineering Research
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Layer Normalization · Cosine Annealing · Residual Connection · Dropout · Weight Decay · Linear Warmup With Cosine Annealing
