Long-Context Linear System Identification
O\u{g}uz Kaan Y\"uksel, Mathieu Even, Nicolas Flammarion

TL;DR
This paper develops sample complexity bounds for long-context linear system identification, revealing that learning is feasible without mixing assumptions and extending results to low-rank and misspecified context models.
Contribution
It introduces the first sample complexity bounds for long-context linear systems, demonstrating learning without mixing and extending to low-rank and misspecified contexts.
Findings
Sample complexity matches i.i.d. parametric rates up to logs.
Learning is not hindered by slow mixing in long contexts.
Low-rank regularization improves dependence on dimensionality.
Abstract
This paper addresses the problem of long-context linear system identification, where the state of a dynamical system at time depends linearly on previous states over a fixed context window of length . We establish a sample complexity bound that matches the i.i.d. parametric rate up to logarithmic factors for a broad class of systems, extending previous works that considered only first-order dependencies. Our findings reveal a learning-without-mixing phenomenon, indicating that learning long-context linear autoregressive models is not hindered by slow mixing properties potentially associated with extended context windows. Additionally, we extend these results to (i) shared low-rank representations, where rank-regularized estimators improve the dependence of the rates on the dimensionality, and (ii) misspecified context lengths in strictly stable systems, where shorter…
Peer Reviews
Decision·ICLR 2025 Poster
**Clarity of exposition:** The paper is well-written and well-organized and systematically introduces the problem setting, contributions, and theoretical derivations. Definitions and assumptions are clearly stated, and the logical progression through each theoretical component makes the paper easy to follow. **Intuitive and well-discussed results:** The concept of "learning-without-mixing" is well-motivated by the authors. This result aligns with the literature on "learning-without-mixing" for
**Misspecification Results and Assumptions:** Section 3.4, particularly Assumption 3.9, imposes a constraint on the misspecified model that may be too restrictive for practical applications. The requirement that $|| (MA^\star - MA^\star_{1:p'})L^\star ||_{\text{op}} \leq D'$ implies that misspecification must remain controlled to a certain degree. The authors could discuss the limitations of Assumption 3.9 if this assumption does not hold in practical settings or offer heuristics for relaxing th
While the topic of linear identification is certainly not new, the theoretical results developed in this paper are novel. The authors clearly stated problem formulations, main results and motivations. Overall, the paper was well written with an enjoyable read.
There are several minor questions and suggestions regarding confusions in the main text (these are deferred to the questions section below). Experiments were minimal and only provided in the appendix.
The results provide non-asymptotic bounds on three problems that are well motivated; earlier works have not covered the case where the process has dependency on the past with a context length.
(1) The non-asymptotic bounds are not with respect to any specific algorithm that takes data and solves the related optimization problems. The authors indicate possible approaches for solving these problems but do not analyze any specific algorithm; however, it stands to reason that the sample complexity will depend on the approach being taken. The rank constrained problem is a particularly challenging one as its not a convex problem. The authors assume a an optimal solution to the problems. Th
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsControl Systems and Identification · Fault Detection and Control Systems
