Arrows of Time for Large Language Models

Vassilis Papadopoulos; J\'er\'emie Wenger; Cl\'ement Hongler

arXiv:2401.17505·cs.LG·July 25, 2024·1 cites

Arrows of Time for Large Language Models

Vassilis Papadopoulos, J\'er\'emie Wenger, Cl\'ement Hongler

PDF

Open Access 2 Repos

TL;DR

This paper investigates the time asymmetry in large language models, revealing a consistent difference in their ability to predict next versus previous tokens, and provides a theoretical explanation for this phenomenon.

Contribution

It uncovers a subtle time asymmetry in LLMs' predictive abilities and offers a novel information-theoretic framework explaining its emergence due to sparsity and complexity.

Findings

01

Empirical evidence of time asymmetry in LLMs' perplexity scores.

02

Theoretical explanation linking asymmetry to sparsity and computational complexity.

03

Consistency of asymmetry across modalities and model sizes.

Abstract

We study the probabilistic modeling performed by Autoregressive Large Language Models (LLMs) through the angle of time directionality, addressing a question first raised in (Shannon, 1951). For large enough models, we empirically find a time asymmetry in their ability to learn natural language: a difference in the average log-perplexity when trying to predict the next token versus when trying to predict the previous one. This difference is at the same time subtle and very consistent across various modalities (language, model size, training time, ...). Theoretically, this is surprising: from an information-theoretic point of view, there should be no such difference. We provide a theoretical framework to explain how such an asymmetry can appear from sparsity and computational complexity considerations, and outline a number of perspectives opened by our results.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques