Automated Detection of Visual Attribute Reliance with a Self-Reflective Agent

Christy Li; Josep Lopez Camu\~nas; Jake Thomas Touchet; Jacob Andreas; Agata Lapedriza; Antonio Torralba; Tamar Rott Shaham

arXiv:2510.21704·cs.CV·November 20, 2025

Automated Detection of Visual Attribute Reliance with a Self-Reflective Agent

Christy Li, Josep Lopez Camu\~nas, Jake Thomas Touchet, Jacob Andreas, Agata Lapedriza, Antonio Torralba, Tamar Rott Shaham

PDF

TL;DR

This paper presents an automated, self-reflective framework for detecting visual attribute dependencies in trained vision models, improving robustness and interpretability by systematically hypothesizing and testing attribute reliance.

Contribution

The paper introduces a novel self-reflective agent that iteratively hypothesizes and verifies visual attribute dependencies, advancing model interpretability and robustness detection methods.

Findings

01

The agent's performance improves with self-reflection.

02

The approach outperforms non-reflective baselines.

03

It identifies real-world attribute dependencies in state-of-the-art models.

Abstract

When a vision model performs image recognition, which visual attributes drive its predictions? Detecting unintended reliance on specific visual features is critical for ensuring model robustness, preventing overfitting, and avoiding spurious correlations. We introduce an automated framework for detecting such dependencies in trained vision models. At the core of our method is a self-reflective agent that systematically generates and tests hypotheses about visual attributes that a model may rely on. This process is iterative: the agent refines its hypotheses based on experimental outcomes and uses a self-evaluation protocol to assess whether its findings accurately explain model behavior. When inconsistencies arise, the agent self-reflects over its findings and triggers a new cycle of experimentation. We evaluate our approach on a novel benchmark of 130 models designed to exhibit diverse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.