Learning to Dissipate Energy in Oscillatory State-Space Models
Jared Boyer, T. Konstantin Rusch, Daniela Rus

TL;DR
This paper introduces Damped Linear Oscillatory State-Space models (D-LinOSS), which enhance previous models by learning energy dissipation on arbitrary time scales, leading to improved long-range sequence learning and faster convergence.
Contribution
The paper proposes D-LinOSS, a generalization of LinOSS that learns to dissipate energy across various time scales, overcoming limitations of fixed dissipation mechanisms.
Findings
D-LinOSS outperforms LinOSS on long-range tasks.
D-LinOSS achieves faster convergence.
Reduces hyperparameter search space by 50%.
Abstract
State-space models (SSMs) are a class of networks for sequence learning that benefit from fixed state size and linear complexity with respect to sequence length, contrasting the quadratic scaling of typical attention mechanisms. Inspired from observations in neuroscience, Linear Oscillatory State-Space models (LinOSS) are a recently proposed class of SSMs constructed from layers of discretized forced harmonic oscillators. Although these models perform competitively, leveraging fast parallel scans over diagonal recurrent matrices and achieving state-of-the-art performance on tasks with sequence length up to 50k, LinOSS models rely on rigid energy dissipation ("forgetting") mechanisms that are inherently coupled to the time scale of state evolution. As forgetting is a crucial mechanism for long-range reasoning, we demonstrate the representational limitations of these models and introduce…
Peer Reviews
Decision·Submitted to ICLR 2026
- The authors clearly outline the contributions of their paper, highlighting the limitations and improvements over existing oscillatory state-space models (LinOSS). - The authors conduct rigorous theoretical analysis of their method and provide a link between the new parameterization to improved representational capacity. They prove that D-LinOSS layers span the full unit disk in the complex plane. - The authors provide a controlled synthetic experiment (learning exponential decay) to justify D
- Most reported improvements in the presented empirical results are marginal. I would like to see more ablation on model size/parameter count to be sure of the practical significance of D-LinOSS. - While D-LinOSS is well-motivated, the contribution feels like a simple extension of LinOSS. The contribution feels incremental in the broader landscape without additional analysis on the interpretation of learned weights.
- The empirical results of D-LinOSS on several real-world datasets are significant, improving State-of-the-Art as well as the previous LinOSS models. - The addition of damping to D-LinOSS offers clear practical and theoretical advantages over LinOSS models, enabling eigenvalues that cover the entire unit disc, thus uncoupling frequency and energy dissipation. Table 1 and Figure 3 are particularly compelling illustrations of these advantages.
- Please clearly define (and if necessary, differentiate) the terms: energy dissipation, forgetting, damping, and their relation to real and imaginary eigenvalue components, eigenvalue magnitude, and oscillation frequency. Particularly energy dissipation, damping, and forgetting are often used loosely and somewhat interchangeably, which can be confusing. It may help to add a section defining important vocabulary at the beginning of Section 2. - The proof of bijectivity in Proposition A.3 is not
- The addition of the dampening term is a trivial, yet needed, addition that is well motivated and, in this reviewer's opinion, should have been part of the LinOSS to begin with. In this sense, authors have identified an important gap. - Benchmarks are comprehensive and convincing that the addition of the dampening term is broadly useful; though more is needed to support the specific claims (see below).
- The scientific novelty/contribution is quite limited. The main model, i.e. LinOSS, in my opinion is a very important direction that future research should build on. However, the addition here is trivial, and does not necessarily lead to new novel insights. To me, this paper feels like a comment on the original paper as opposed to having enough novelty for deserving a paper of its own. - I find the claim "reducing the hyperparameter search space by 50%" to be slightly misleading. Practically,
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFerroelectric and Negative Capacitance Devices · Neural Networks and Reservoir Computing · Advanced Memory and Neural Computing
MethodsSoftmax · Attention Is All You Need
