Why Training-Free Token Reduction Collapses: The Inherent Instability of Pairwise Scoring Signals

Yang Shanglin

arXiv:2604.16745·cs.AI·April 21, 2026

Why Training-Free Token Reduction Collapses: The Inherent Instability of Pairwise Scoring Signals

Yang Shanglin

PDF

TL;DR

This paper investigates why training-free token reduction methods for Vision Transformers fail at high compression, revealing inherent instability in pairwise similarity signals and proposing a diagnostic framework and a new method, CATIS, to improve stability and performance.

Contribution

The paper introduces a diagnostic framework with ranking consistency and off-diagonal correlation to analyze collapse causes and proposes CATIS, a unary signal-based method, to enhance token reduction stability.

Findings

01

Pairwise similarity signals degrade significantly in deep layers.

02

Unary signals are more stable than pairwise signals due to lower perturbation sensitivity.

03

CATIS achieves near-original accuracy at 63% FLOPs reduction, outperforming baselines.

Abstract

Training-free token reduction methods for Vision Transformers (ToMe, ToFu, PiToMe, and MCTF) employ different scoring mechanisms, yet they share a closely matched cliff-like collapse at high compression. This paper explains \emph{why}. We develop a diagnostic framework with two tools, ranking consistency $ρ_{s}$ and off-diagonal correlation $ρ_{off}$ , that decomposes the collapse into (1)a signal-agnostic error amplifier inherent to layer-wise reduction, predicting convex Pareto curves and $r_{crit} \propto 1/ L$ ; and (2)shared reliance on \emph{pairwise} similarity signals whose ranking consistency degrades from $ρ_{s} = 0.88$ to $0.27$ in deep layers. Pairwise rankings are inherently unstable ( $O (N_{p}^{2})$ joint perturbations) while unary signals enjoy greater stability ( $O (N_{p})$ perturbations, CLT). From three design principles derived from this diagnosis, we construct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.