Autoregressive Learning in Joint KL: Sharp Oracle Bounds and Lower Bounds
Yunbei Xu, Yuzhe Yuan, Ruohan Zhan

TL;DR
This paper provides a comprehensive theoretical analysis of autoregressive sequence learning under model misspecification using joint KL divergence, revealing bounds on approximation and estimation errors that depend on sequence length.
Contribution
It establishes the first complete characterization of long-horizon error behavior under joint KL, including matching upper and lower bounds and horizon-free approximation factors.
Findings
Joint KL admits a horizon-free approximation factor, unlike Hellinger-based metrics.
Fundamental lower bound of order H for estimation error, matching upper bounds.
Joint KL guarantees imply policy learning regret bounds similar to existing imitation learning results.
Abstract
We study the fundamental and timely problem of learning long sequences in autoregressive modeling and next-token prediction under model misspecification, measured by the joint Kullback--Leibler (KL) divergence. Our goal is to characterize how the sequence horizon \(H\) affects both approximation and estimation errors in this joint-distribution, sequence-level regime. By establishing matching upper and lower bounds, we provide, to our knowledge, the first complete characterization of long-horizon error behavior under the natural joint KL objective, with improved rates and optimality justification relative to existing work. On the approximation side, we show that joint KL admits a horizon-free approximation factor, in sharp contrast to Hellinger-based analyses that exhibit an \(\Omega(H)\) dependence for computationally efficient methods; this isolates the choice of divergence as the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
