Embers of Autoregression: Understanding Large Language Models Through   the Problem They are Trained to Solve

R. Thomas McCoy; Shunyu Yao; Dan Friedman; Matthew Hardy; Thomas L.; Griffiths

arXiv:2309.13638·cs.CL·September 26, 2023·34 cites

Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve

R. Thomas McCoy, Shunyu Yao, Dan Friedman, Matthew Hardy, Thomas L., Griffiths

PDF

Open Access

TL;DR

This paper proposes a teleological approach to understanding large language models by analyzing the problem they are trained to solve, revealing how probabilities influence their accuracy and failure modes.

Contribution

It introduces the probabilistic factors influencing LLM performance and empirically tests these predictions on GPT-3.5 and GPT-4 across multiple tasks.

Findings

01

LLMs perform better when task and output probabilities are high

02

GPT-4's cipher decoding accuracy drops from 51% to 13% in low-probability scenarios

03

LLMs are shaped by their training pressures, not human-like cognition

Abstract

The widespread adoption of large language models (LLMs) makes it important to recognize their strengths and limitations. We argue that in order to develop a holistic understanding of these systems we need to consider the problem that they were trained to solve: next-word prediction over Internet text. By recognizing the pressures that this task exerts we can make predictions about the strategies that LLMs will adopt, allowing us to reason about when they will succeed or fail. This approach - which we call the teleological approach - leads us to identify three factors that we hypothesize will influence LLM accuracy: the probability of the task to be performed, the probability of the target output, and the probability of the provided input. We predict that LLMs will achieve higher accuracy when these probabilities are high than when they are low - even in deterministic settings where…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

Methodsfail