Provable Length Generalization in Sequence Prediction via Spectral Filtering
Annie Marsden, Evan Dogariu, Naman Agarwal, Xinyi Chen, Daniel Suo,, Elad Hazan

TL;DR
This paper introduces a spectral filtering approach with a new performance metric, demonstrating provable length generalization in sequence prediction for linear dynamical systems through theoretical analysis and experiments.
Contribution
It proposes a novel spectral filtering algorithm and the Asymmetric-Regret metric, providing the first provable guarantees for length generalization in sequence prediction.
Findings
Spectral filtering achieves length generalization in linear dynamical systems.
The Asymmetric-Regret metric effectively measures performance against longer context predictors.
Experiments support the theoretical claims of the proposed method.
Abstract
We consider the problem of length generalization in sequence prediction. We define a new metric of performance in this setting -- the Asymmetric-Regret -- which measures regret against a benchmark predictor with longer context length than available to the learner. We continue by studying this concept through the lens of the spectral filtering algorithm. We present a gradient-based learning algorithm that provably achieves length generalization for linear dynamical systems. We conclude with proof-of-concept experiments which are consistent with our theory.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Neural Networks and Applications
