Model Guidance via Robust Feature Attribution

Mihnea Ghitu; Vihari Piratla; Matthew Wicker

arXiv:2506.19680·cs.LG·September 23, 2025

Model Guidance via Robust Feature Attribution

Mihnea Ghitu, Vihari Piratla, Matthew Wicker

PDF

Open Access

TL;DR

This paper introduces a new method for guiding models to focus on relevant features by optimizing explanation robustness, effectively reducing reliance on shortcut features across various domains, including NLP and medical imaging.

Contribution

We propose a simplified objective that improves explanation robustness and mitigates shortcut learning, with theoretical justification and extensive empirical validation across multiple tasks.

Findings

01

Reduces test-time misclassifications by 20% compared to state-of-the-art.

02

Effective across diverse domains including NLP and medical imaging.

03

Highlights the importance of annotation quality over quantity.

Abstract

Controlling the patterns a model learns is essential to preventing reliance on irrelevant or misleading features. Such reliance on irrelevant features, often called shortcut features, has been observed across domains, including medical imaging and natural language processing, where it may lead to real-world harms. A common mitigation strategy leverages annotations (provided by humans or machines) indicating which features are relevant or irrelevant. These annotations are compared to model explanations, typically in the form of feature salience, and used to guide the loss function during training. Unfortunately, recent works have demonstrated that feature salience methods are unreliable and therefore offer a poor signal to optimize. In this work, we propose a simplified objective that simultaneously optimizes for explanation robustness and mitigation of shortcut learning. Unlike prior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning