Masking Strategies for Background Bias Removal in Computer Vision Models
Ananthu Aniraj, Cassio F. Dantas, Dino Ienco, Diego Marcos

TL;DR
This paper investigates background bias in fine-grained image classification and proposes two masking strategies, demonstrating that early masking significantly improves out-of-distribution background generalization for CNN and ViT models.
Contribution
It introduces early and late masking strategies to reduce background bias, with early masking showing superior out-of-distribution robustness in vision models.
Findings
Both masking strategies improve OOD background generalization.
Early masking consistently outperforms late masking.
ViT with GAP-Pooled Patch token and early masking achieves highest robustness.
Abstract
Models for fine-grained image classification tasks, where the difference between some classes can be extremely subtle and the number of samples per class tends to be low, are particularly prone to picking up background-related biases and demand robust methods to handle potential examples with out-of-distribution (OOD) backgrounds. To gain deeper insights into this critical problem, our research investigates the impact of background-induced bias on fine-grained image classification, evaluating standard backbone models such as Convolutional Neural Network (CNN) and Vision Transformers (ViT). We explore two masking strategies to mitigate background-induced bias: Early masking, which removes background information at the (input) image level, and late masking, which selectively masks high-level spatial features corresponding to the background. Extensive experiments assess the behavior of CNN…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Machine Learning and Data Classification
MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Layer Normalization · Linear Layer · Dense Connections · Residual Connection · Vision Transformer · self-DIstillation with NO labels · ConvNeXt
