Graph Signal Processing Meets Mamba2: Adaptive Filter Bank via Delta Modulation

Yehjin Shin; Seojin Kim; Noseong Park

arXiv:2603.22333·cs.LG·March 25, 2026

Graph Signal Processing Meets Mamba2: Adaptive Filter Bank via Delta Modulation

Yehjin Shin, Seojin Kim, Noseong Park

PDF

Open Access 3 Reviews

TL;DR

This paper introduces HADES, a GSP-inspired hierarchical adaptive filter bank framework for SSMs like Mamba2, improving efficiency and interpretability while maintaining competitive performance across language tasks.

Contribution

HADES reinterprets Mamba2 as an adaptive filter bank on a line graph, introducing hierarchical filters for global and local behaviors, bridging GSP and neural sequence modeling.

Findings

01

HADES achieves comparable performance to Mamba2 on multiple benchmarks.

02

HADES uses only 58.9% of the parameters of baseline models.

03

HADES provides a structured, interpretable filtering approach for SSMs.

Abstract

State-space models (SSMs) offer efficient alternatives to attention with linear-time recurrence. Mamba2, a recent SSM-based language model, uses selective input gating and a multi-head structure, enabling parallel computation and strong benchmark performance. However, its multi-head recurrence operates independently without structured utilization or analysis. In this work, we propose a novel method called Hierarchical ADaptive filter bank for Efficient SSMs (HADES), a Graph Signal Processing (GSP)-inspired framework that reinterprets Mamba2 as an adaptive filter bank on a line graph. Our hierarchical architecture introduces two filter types: shared filters for global low-pass behavior and expert filters for local high-pass behavior, achieved through structured bias on the parameter {\Delta}. HADES achieves comparable performance to baseline models including Mamba2 across various…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 5

Strengths

1. The paper introduces an interesting conceptual link between Graph Signal Processing and Mamba2, re-framing multi head SSMs as filter banks. This GSP perspective motivates a novel routing mechanism based on a spectral residual which is a creative approach to token-adaptive computation. 2. The authors conduct a thorough set of ablations on their proposed 370M parameter model, which helps validate the components of their design, such as the auxiliary losses and the hierarchical filter structure

Weaknesses

1. The paper's primary weakness is that all experiments are confined to a single, small 370M parameter model. The central claim of 58.9% parameter savings is not validated at larger scales (e.g., 1B+), where model dynamics and efficiency trade-offs are known to change. This severely limits the generality and impact of the findings. 2. The paper's core premise of a GSP framework is not very strong. The authors admit that "spectral properties are not explicitly enforced" but rather "indirectly en

Reviewer 02Rating 6Confidence 3

Strengths

- The formalization of SSM heads into Graph Signal Processing's filter bank is clear and allows principled analysis about low-pass and adaptive behavior. - The routing/bias mechanism tied to \Delta_{HADES} gives a minimal hook for content-adaptive dynamics which aligns with SSM parametrization. - Competitive results at a much lower parameter count regime and performance improvements in long-context tasks. - The ablation study and analysis are thorough.

Weaknesses

- While the competence of HADES with respect to the reduced number of parameters seems promising, the architecture does seem to cause more FLOP overhead. Listing this analysis would strengthen the contributions of this work. - The spectral analysis (FFT) on hidden sequences from one layer may be confounded with the layer or the gamma value.

Reviewer 03Rating 4Confidence 4

Strengths

Originality: - Recasts multi-head Mamba2 as a graph filter bank on a line graph, connecting LTV SSMs to graph signal processing (GSP) and framing heads as node-variant graph filters. Introduces a novel architecture HADES, a hierarchical filter bank with (i) always-on shared filters and (ii) token-routed expert filters, selected via a spectral residual and $\Delta$-modulation. - Their construction of expert filters creates more opportunity for modular/interpretable filters, as they are trained t

Weaknesses

While I enjoyed the paper's presentation and ideas overall, I think the major weakness of the paper is the strength of their empirical evidence. I will list this in two major axes: 1. **Lack of scale**: Despite the thorough experiments in ablation, sensitivity, and multiple baselines, the paper only operates on the 200B-token Pile training of 370M-parameter models. Because we only get one data point in the (number of tokens trained, model scale), it is hard to know if the model will scale or hol

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Graph Theory and Algorithms · Topic Modeling