Autoregressive Modeling with Lookahead Attention

Li Du; Hongyuan Mei; Jason Eisner

arXiv:2305.12272·cs.CL·May 23, 2023·1 cites

Autoregressive Modeling with Lookahead Attention

Li Du, Hongyuan Mei, Jason Eisner

PDF

Open Access

TL;DR

This paper introduces a novel Transformer-based autoregressive model that incorporates lookahead by extrapolating multiple future continuations, improving performance on various tasks by leveraging hypothetical future information.

Contribution

The paper proposes a new lookahead attention mechanism in autoregressive models, inspired by classical AI systems, to utilize future trajectory estimates for better predictions.

Findings

01

Outperforms standard Transformer models on multiple tasks.

02

Benefits from additional computation even without true lookahead.

03

Discusses potential architectures and speedups for future work.

Abstract

To predict the next token, autoregressive models ordinarily examine the past. Could they also benefit from also examining hypothetical futures? We consider a novel Transformer-based autoregressive architecture that estimates the next-token distribution by extrapolating multiple continuations of the past, according to some proposal distribution, and attending to these extended strings. This architecture draws insights from classical AI systems such as board game players: when making a local decision, a policy may benefit from exploring possible future trajectories and analyzing them. On multiple tasks including morphological inflection and Boolean satisfiability, our lookahead model is able to outperform the ordinary Transformer model of comparable size. However, on some tasks, it appears to be benefiting from the extra computation without actually using the lookahead information. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Games · Sports Analytics and Performance · Evolutionary Algorithms and Applications

MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Layer Normalization · Byte Pair Encoding · Dropout · Linear Layer · Label Smoothing · Position-Wise Feed-Forward Layer · Adam