Diagnosing Generalization Failures from Representational Geometry Markers
Chi-Ning Chou, Artem Kirsanov, Yao-Yuan Yang, SueYeon Chung

TL;DR
This paper introduces a top-down approach using representational geometry markers to predict and diagnose generalization failures in AI models, especially for out-of-distribution scenarios, improving model robustness and interpretability.
Contribution
It proposes a novel geometric marker-based method to forecast model performance on unseen data, moving beyond mechanistic explanations to system-level indicators.
Findings
Geometric properties of in-distribution object manifolds predict out-of-distribution performance.
Reductions in effective manifold dimensionality and utility forecast weaker OOD generalization.
Geometric patterns outperform ID accuracy in predicting transfer learning success.
Abstract
Generalization, the ability to perform well beyond the training context, is a hallmark of biological and artificial intelligence, yet anticipating unseen failures remains a central challenge. Conventional approaches often take a ``bottom-up'' mechanistic route by reverse-engineering interpretable features or circuits to build explanatory models. While insightful, these methods often struggle to provide the high-level, predictive signals for anticipating failure in real-world deployment. Here, we propose using a ``top-down'' approach to studying generalization failures inspired by medical biomarkers: identifying system-level measurements that serve as robust indicators of a model's future performance. Rather than mapping out detailed internal mechanisms, we systematically design and test network markers to probe structure, function links, identify prognostic indicators, and validate…
Peer Reviews
Decision·ICLR 2026 Poster
* Fantastic scientific exposition of the idea, the design, and the results. * The found candidates for OOD failure markers are interesting and non-trivial. Thus, it may trigger future research on the mechanism beyond their contribution to identifying OOD failures, leading to a breakthrough in our understanding of the issue.
* The manuscript is performing what is called in statistical literature "a fishing expedition" for markers. For a fixed set of datasets, had the author tested thousands of markers, they could have reported selectively on the most promising ones, thus finding markers that succeed by chance and do not generalize to other datasets. I don't suspect the authors' ethics -- but they need to take measures against such a mistake. The authors can safeguard against random markers by calculating the number
1. The paper is extremely well-written, clear, and argues its central claims well. Despite relying heavily on prior work in GLUE, and therefore having little space to go over the theory, the authors do a good job of providing the necessary intuition for various concepts. 2. The empirical evaluation is comprehensive, covering many model architectures, datasets, and hyper-parameter configurations. Not only does this go a long way towards bolstering the authors claims, I believe the existence of th
1. While I appreciate the novelty of the medical framing, ultimately, none of the technical aspects of this framing translate into the framework. Instead of providing new insight, this perspective only seemed to confuse me. For example, the connection to "biomarkers" is far less important than emphasizing that measures of performance need to be task-relevant *as well as* descriptive of underlying mechanisms. The framing is not a deal-breaker, but I believe the paper would be stronger if it spent
1. The topic is of great importance -- being able to reliably predict a model's OOD performance without having access to the OOD dataset is of great importance. 2. The presentation of the whole work is clear and interesting, with some room for improvement regarding the technical details (which I'll discuss later). 3. I particularly appreciate authors' debating current trends which overly focus on studying models using tools developed in mathematics or physics. The idea to draw more inspiration
1. The main issue I see with this work is the lack of comparison to previous works studying the problem. While the authors do reference several papers in related works, they seem to miss the core works that study the same questions. For instance, [1] introduced the Tunnel Effect Hypothesis, showing that the drop of OOD performance is strongly correlated with the numerical rank of representations. Further [2] refined the Tunnel Hypothesis, showing how the Tunnel Effect (and thus OOD performance)
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Cell Image Analysis Techniques · Explainable Artificial Intelligence (XAI)
