Rethink MAE with Linear Time-Invariant Dynamics

Zice Wang

arXiv:2605.00915·cs.CV·May 5, 2026

Rethink MAE with Linear Time-Invariant Dynamics

Zice Wang

PDF

TL;DR

This paper introduces SSMProbe, a permutation-sensitive probing framework using linear time-invariant dynamics to analyze token order importance in frozen visual representations like MAE and ViT.

Contribution

It challenges the permutation-invariant paradigm by modeling token order as an information scheduling problem with a differentiable soft permutation approach.

Findings

01

Learned soft permutation outperforms fixed scans on localized patch features.

02

Pre-training objectives influence token structure and heterogeneity.

03

Order-dependent performance varies with token placement and pre-training method.

Abstract

Standard representation probing for visual models relies on mathematically permutation-invariant operations like Global Average Pooling (GAP) or CLS tokens, treating patch representations as an unstructured bag-of-words. We challenge this paradigm by demonstrating that token order is a critical, exploitable dimension in frozen visual representations (e.g., MAE, BEiT, DINOv2, and ViT as CLS-ablation extreme). We propose SSMProbe, a probing framework driven by a State Space Model (SSM). Operating as discrete Linear Time-Invariant (LTI) dynamical systems, SSMs act as permutation-sensitive probes where sequence order strictly dictates the final state due to inherent memory decay. Formulating token ordering as an information scheduling problem, we compare fixed scan heuristics against a differentiable soft permutation (Sinkhorn-based) learned from downstream supervision. Evaluations on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.