Faith and Fate: Limits of Transformers on Compositionality

Nouha Dziri; Ximing Lu; Melanie Sclar; Xiang Lorraine Li; Liwei Jiang,; Bill Yuchen Lin; Peter West; Chandra Bhagavatula; Ronan Le Bras; Jena D.; Hwang; Soumya Sanyal; Sean Welleck; Xiang Ren; Allyson Ettinger; Zaid; Harchaoui; Yejin Choi

arXiv:2305.18654·cs.CL·November 1, 2023·71 cites

Faith and Fate: Limits of Transformers on Compositionality

Nouha Dziri, Ximing Lu, Melanie Sclar, Xiang Lorraine Li, Liwei Jiang,, Bill Yuchen Lin, Peter West, Chandra Bhagavatula, Ronan Le Bras, Jena D., Hwang, Soumya Sanyal, Sean Welleck, Xiang Ren, Allyson Ettinger, Zaid, Harchaoui, Yejin Choi

PDF

Open Access 1 Repo 2 Videos

TL;DR

This paper investigates the limitations of transformer large language models in solving complex compositional tasks, revealing they rely on subgraph matching rather than systematic reasoning, with performance degrading as task complexity increases.

Contribution

The study introduces a systematic framework to analyze transformer LLMs on compositional tasks and provides theoretical insights into their reasoning limitations.

Findings

01

Transformers solve tasks via subgraph matching, not systematic reasoning.

02

Performance declines rapidly with increased task complexity.

03

Empirical and theoretical analysis highlight fundamental limitations.

Abstract

Transformer large language models (LLMs) have sparked admiration for their exceptional performance on tasks that demand intricate multi-step reasoning. Yet, these models simultaneously show failures on surprisingly trivial problems. This begs the question: Are these errors incidental, or do they signal more substantial limitations? In an attempt to demystify transformer LLMs, we investigate the limits of these models across three representative compositional tasks -- multi-digit multiplication, logic grid puzzles, and a classic dynamic programming problem. These tasks require breaking problems down into sub-steps and synthesizing these steps into a precise answer. We formulate compositional tasks as computation graphs to systematically quantify the level of complexity, and break down reasoning steps into intermediate sub-procedures. Our empirical findings suggest that transformer LLMs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nouhadziri/faith-and-fate
noneOfficial

Videos

ChatGPT Fails Basic Logic but Now Has Vision, Wins at Chess and Prompts a Masterpiece· youtube

Faith and Fate: Limits of Transformers on Compositionality· slideslive

Taxonomy

TopicsTopic Modeling · Machine Learning in Materials Science · Natural Language Processing Techniques