Adjoint Inversion Reveals Holographic Superposition and Destructive Interference in CNN Classifiers
Kaixiang Shu

TL;DR
This paper introduces a novel inversion framework that reveals CNN classifiers operate via destructive interference, challenging the traditional Spatial Funnel Hypothesis and providing new insights into model interpretability and OOD failure.
Contribution
It presents a hallucination-free inversion method that uncovers pixel-level superposition and interference in CNNs, with a covariance-volume channel selection algorithm and OOD failure analysis.
Findings
CNN encoders exhibit strong superposition at pixel level.
Classification results from interference of channels rather than background suppression.
Covariance volume collapse correlates with out-of-distribution failures.
Abstract
A foundational assumption in CNN interpretability -- that deep encoders suppress background pixels while classifiers merely select from a cleaned feature pool (the Spatial Funnel Hypothesis) -- remains untested due to spatial hallucinations in existing visualization tools. We address this by introducing a hallucination-free inversion framework built on magnitude-phase decoupling and Local Adjoint Correctors. Our method mathematically guarantees that the spatial gradient support of every reconstruction stems strictly from genuinely active channels. Using this framework as a geometric probe, we uncover the first pixel-level evidence of strong superposition in vision encoders. We show that per-channel inversions are uniformly holographic: positive and negative weight reconstructions are visually and energetically indistinguishable. However, their algebraic sum sharply concentrates on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
