Provable Length Generalization in Sequence Prediction via Spectral   Filtering

Annie Marsden; Evan Dogariu; Naman Agarwal; Xinyi Chen; Daniel Suo,; Elad Hazan

arXiv:2411.01035·cs.LG·November 5, 2024

Provable Length Generalization in Sequence Prediction via Spectral Filtering

Annie Marsden, Evan Dogariu, Naman Agarwal, Xinyi Chen, Daniel Suo,, Elad Hazan

PDF

Open Access

TL;DR

This paper introduces a spectral filtering approach with a new performance metric, demonstrating provable length generalization in sequence prediction for linear dynamical systems through theoretical analysis and experiments.

Contribution

It proposes a novel spectral filtering algorithm and the Asymmetric-Regret metric, providing the first provable guarantees for length generalization in sequence prediction.

Findings

01

Spectral filtering achieves length generalization in linear dynamical systems.

02

The Asymmetric-Regret metric effectively measures performance against longer context predictors.

03

Experiments support the theoretical claims of the proposed method.

Abstract

We consider the problem of length generalization in sequence prediction. We define a new metric of performance in this setting -- the Asymmetric-Regret -- which measures regret against a benchmark predictor with longer context length than available to the learner. We continue by studying this concept through the lens of the spectral filtering algorithm. We present a gradient-based learning algorithm that provably achieves length generalization for linear dynamical systems. We conclude with proof-of-concept experiments which are consistent with our theory.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Neural Networks and Applications