Long-Context Linear System Identification

O\u{g}uz Kaan Y\"uksel; Mathieu Even; Nicolas Flammarion

arXiv:2410.05690·stat.ML·July 3, 2025

Long-Context Linear System Identification

O\u{g}uz Kaan Y\"uksel, Mathieu Even, Nicolas Flammarion

PDF

Open Access 3 Reviews

TL;DR

This paper develops sample complexity bounds for long-context linear system identification, revealing that learning is feasible without mixing assumptions and extending results to low-rank and misspecified context models.

Contribution

It introduces the first sample complexity bounds for long-context linear systems, demonstrating learning without mixing and extending to low-rank and misspecified contexts.

Findings

01

Sample complexity matches i.i.d. parametric rates up to logs.

02

Learning is not hindered by slow mixing in long contexts.

03

Low-rank regularization improves dependence on dimensionality.

Abstract

This paper addresses the problem of long-context linear system identification, where the state $x_{t}$ of a dynamical system at time $t$ depends linearly on previous states $x_{s}$ over a fixed context window of length $p$ . We establish a sample complexity bound that matches the i.i.d. parametric rate up to logarithmic factors for a broad class of systems, extending previous works that considered only first-order dependencies. Our findings reveal a learning-without-mixing phenomenon, indicating that learning long-context linear autoregressive models is not hindered by slow mixing properties potentially associated with extended context windows. Additionally, we extend these results to (i) shared low-rank representations, where rank-regularized estimators improve the dependence of the rates on the dimensionality, and (ii) misspecified context lengths in strictly stable systems, where shorter…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 8Confidence 4

Strengths

**Clarity of exposition:** The paper is well-written and well-organized and systematically introduces the problem setting, contributions, and theoretical derivations. Definitions and assumptions are clearly stated, and the logical progression through each theoretical component makes the paper easy to follow. **Intuitive and well-discussed results:** The concept of "learning-without-mixing" is well-motivated by the authors. This result aligns with the literature on "learning-without-mixing" for

Weaknesses

**Misspecification Results and Assumptions:** Section 3.4, particularly Assumption 3.9, imposes a constraint on the misspecified model that may be too restrictive for practical applications. The requirement that $|| (MA^\star - MA^\star_{1:p'})L^\star ||_{\text{op}} \leq D'$ implies that misspecification must remain controlled to a certain degree. The authors could discuss the limitations of Assumption 3.9 if this assumption does not hold in practical settings or offer heuristics for relaxing th

Reviewer 02Rating 8Confidence 3

Strengths

While the topic of linear identification is certainly not new, the theoretical results developed in this paper are novel. The authors clearly stated problem formulations, main results and motivations. Overall, the paper was well written with an enjoyable read.

Weaknesses

There are several minor questions and suggestions regarding confusions in the main text (these are deferred to the questions section below). Experiments were minimal and only provided in the appendix.

Reviewer 03Rating 6Confidence 3

Strengths

The results provide non-asymptotic bounds on three problems that are well motivated; earlier works have not covered the case where the process has dependency on the past with a context length.

Weaknesses

(1) The non-asymptotic bounds are not with respect to any specific algorithm that takes data and solves the related optimization problems. The authors indicate possible approaches for solving these problems but do not analyze any specific algorithm; however, it stands to reason that the sample complexity will depend on the approach being taken. The rank constrained problem is a particularly challenging one as its not a convex problem. The authors assume a an optimal solution to the problems. Th

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsControl Systems and Identification · Fault Detection and Control Systems