NeuronTune: Towards Self-Guided Spurious Bias Mitigation
Guangtao Zheng, Wenqian Ye, Aidong Zhang

TL;DR
NeuronTune is a post hoc method that identifies and regulates neurons responsible for spurious biases in neural networks, improving robustness without needing external annotations of biases.
Contribution
It introduces a self-guided, post hoc approach to mitigate spurious bias by intervening in the model's internal neuron activations, without relying on external bias annotations.
Findings
Significantly reduces spurious bias across architectures.
Operates without external bias annotations.
Improves model robustness in various data modalities.
Abstract
Deep neural networks often develop spurious bias, reliance on correlations between non-essential features and classes for predictions. For example, a model may identify objects based on frequently co-occurring backgrounds rather than intrinsic features, resulting in degraded performance on data lacking these correlations. Existing mitigation approaches typically depend on external annotations of spurious correlations, which may be difficult to obtain and are not relevant to the spurious bias in a model. In this paper, we take a step towards self-guided mitigation of spurious bias by proposing NeuronTune, a post hoc method that directly intervenes in a model's internal decision process. Our method probes in a model's latent embedding space to identify and regulate neurons that lead to spurious prediction behaviors. We theoretically justify our approach and show that it brings the model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning
MethodsHigh-Order Consensuses
