
TL;DR
This paper introduces SSMProbe, a permutation-sensitive probing framework using linear time-invariant dynamics to analyze token order importance in frozen visual representations like MAE and ViT.
Contribution
It challenges the permutation-invariant paradigm by modeling token order as an information scheduling problem with a differentiable soft permutation approach.
Findings
Learned soft permutation outperforms fixed scans on localized patch features.
Pre-training objectives influence token structure and heterogeneity.
Order-dependent performance varies with token placement and pre-training method.
Abstract
Standard representation probing for visual models relies on mathematically permutation-invariant operations like Global Average Pooling (GAP) or CLS tokens, treating patch representations as an unstructured bag-of-words. We challenge this paradigm by demonstrating that token order is a critical, exploitable dimension in frozen visual representations (e.g., MAE, BEiT, DINOv2, and ViT as CLS-ablation extreme). We propose SSMProbe, a probing framework driven by a State Space Model (SSM). Operating as discrete Linear Time-Invariant (LTI) dynamical systems, SSMs act as permutation-sensitive probes where sequence order strictly dictates the final state due to inherent memory decay. Formulating token ordering as an information scheduling problem, we compare fixed scan heuristics against a differentiable soft permutation (Sinkhorn-based) learned from downstream supervision. Evaluations on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
