GradPCA: Leveraging NTK Alignment for Reliable Out-of-Distribution Detection
Mariia Seleznova, Hung-Hsu Chou, Claudio Mayrink Verdun, Gitta Kutyniok

TL;DR
GradPCA is a novel OOD detection method that leverages NTK-induced low-rank gradient structures and PCA to improve reliability and consistency across image classification tasks, supported by theoretical insights.
Contribution
It introduces GradPCA, a spectral OOD detection technique based on NTK alignment, with a theoretical framework explaining its effectiveness and the importance of feature quality.
Findings
GradPCA outperforms existing OOD detectors on standard benchmarks.
NTK alignment induces low-rank gradient structures beneficial for detection.
Pretrained features significantly enhance detector performance.
Abstract
We introduce GradPCA, an Out-of-Distribution (OOD) detection method that exploits the low-rank structure of neural network gradients induced by Neural Tangent Kernel (NTK) alignment. GradPCA applies Principal Component Analysis (PCA) to gradient class-means, achieving more consistent performance than existing methods across standard image classification benchmarks. We provide a theoretical perspective on spectral OOD detection in neural networks to support GradPCA, highlighting feature-space properties that enable effective detection and naturally emerge from NTK alignment. Our analysis further reveals that feature quality -- particularly the use of pretrained versus non-pretrained representations -- plays a crucial role in determining which detectors will succeed. Extensive experiments validate the strong performance of GradPCA, and our theoretical framework offers guidance for…
Peer Reviews
Decision·ICLR 2026 Poster
Overall, this is a well-written and sophisticated paper with clear motivation, background, and rationale. The writing is made to be accessible to a general deep-learning audience. The paper has a clear theoretical grounding that connects OOD detection to NTK alignment and covariance low-rank structure, offering a mathematically elegant view. The method essentially captures a low-rank representation of the NTK kernel / gradient covariance matrix. Kernel-based reasoning is well-founded; kernel a
The choice of NTK, while natural given the theoretical link, is not unique. Other kernels (e.g., Fisher information, feature-space kernels) could also exhibit similar low-rank behavior, so the generality of the method is not fully demonstrated. The paper would benefit from a clearer articulation of why the NTK is the most appropriate or insightful kernel for connecting spectral structure with OOD behavior. In other words, it needs to “ring a bell” by making the NTK–OOD link feel both necessary a
- Strong Theoretical Contribution: The paper offers a formal framework for spectral OOD detection in NNs. Theorem 4.1, which provides a "sufficient condition for spectral OOD detection" , is a strong theoretical result that provides a deterministic, per-sample OOD certificate. - Excellent Empirical Results & Consistency: GradPCA achieves SOTA or near-SOTA results, but more importantly, it demonstrates the most consistent performance across all benchmarks. This directly addresses a major problem
- Memory Scalability: The primary weakness is that the memory cost scales with the number of classes, $C$. The method stores $O(C)$ gradient vectors, which "can be costly for large C" like in ImageNet15. While the paper shows this is manageable (e.g., 7.5GB for ImageNet in the worst case, but often less ), it could be a barrier for datasets with thousands or tens of thousands of classes. - Core Assumption: The method's success relies on the assumption that NTK alignment provides a low-rank str
The paper establishes strong theoretical motivation behind the GradPCA framework along with some reasoning as to why it works and when the OOD detectors can be thought of as reliable, along with complementing empirical results. The proposed method is computationally tractable and therefore more practical. O(NP) to O(C). Decent insights on feature quality explaining when regularity based methods outperform abnormality based methods and vice-versa. Good empirical results.
How does GradPCA compare to the similar recent work [1] which also does PCA on the gradients? Would be good if the authors clarify the major differences and their contributions. [1] Wu, Yingwen, et al. "Low-dimensional gradient helps out-of-distribution detection." IEEE Transactions on Pattern Analysis and Machine Intelligence (2024). Only 3 models seem to be used in the experimentation. A greater diverse set might have introduced models that might not exactly follow the NTK alignment theory
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Fault Detection and Control Systems · Distributed Sensor Networks and Detection Algorithms
MethodsNeural Tangent Kernel
