Towards Worst-Case Guarantees with Scale-Aware Interpretability

Lauren Greenspan; David Berman; Aryeh Brill; Ro Jefferson; Artemy Kolchinsky; Jennifer Lin; Andrew Mack; Anindita Maiti; Fernando E. Rosas; Alexander Stapleton; Lucas Teixeira; Dmitry Vaintrob

arXiv:2602.05184·hep-th·February 6, 2026

Towards Worst-Case Guarantees with Scale-Aware Interpretability

Lauren Greenspan, David Berman, Aryeh Brill, Ro Jefferson, Artemy Kolchinsky, Jennifer Lin, Andrew Mack, Anindita Maiti, Fernando E. Rosas, Alexander Stapleton, Lucas Teixeira, Dmitry Vaintrob

PDF

Open Access

TL;DR

This paper proposes a scale-aware interpretability framework for neural networks, leveraging physics-inspired renormalisation techniques to provide robustness and formal guarantees on feature influence across resolutions.

Contribution

It introduces a unifying research agenda combining physics-based methods with AI interpretability to improve robustness and faithfulness of explanations.

Findings

01

Framework based on renormalisation from physics for interpretability

02

Synthesis of interdisciplinary research into practical tools

03

Potential for formal robustness guarantees in model explanations

Abstract

Neural networks organize information according to the hierarchical, multi-scale structure of natural data. Methods to interpret model internals should be similarly scale-aware, explicitly tracking how features compose across resolutions and guaranteeing bounds on the influence of fine-grained structure that is discarded as irrelevant noise. We posit that the renormalisation framework from physics can meet this need by offering technical tools that can overcome limitations of current methods. Moreover, relevant work from adjacent fields has now matured to a point where scattered research threads can be synthesized into practical, theory-informed tools. To combine these threads in an AI safety context, we propose a unifying research agenda -- \emph{scale-aware interpretability} -- to develop formal machinery and interpretability tools that have robustness and faithfulness properties…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis