Pinpoint Counterfactuals: Reducing social bias in foundation models via localized counterfactual generation
Kirill Sirotkin, Marcos Escudero-Vi\~nolo, Pablo Carballeira, Mayug, Maniparambil, Catarina Barata, Noel E. O'Connor

TL;DR
This paper introduces a localized counterfactual generation technique that reduces societal biases in foundation models by creating high-fidelity, attribute-specific counterfactual images, leading to fairer models without sacrificing performance.
Contribution
It proposes a novel localized counterfactual generation method that preserves image context and improves bias mitigation in foundation models.
Findings
Higher visual and semantic fidelity in counterfactuals compared to existing methods.
Bias reduction evidenced by decreased gender classification disparity.
Maintains model performance on non-human-centric tasks.
Abstract
Foundation models trained on web-scraped datasets propagate societal biases to downstream tasks. While counterfactual generation enables bias analysis, existing methods introduce artifacts by modifying contextual elements like clothing and background. We present a localized counterfactual generation method that preserves image context by constraining counterfactual modifications to specific attribute-relevant regions through automated masking and guided inpainting. When applied to the Conceptual Captions dataset for creating gender counterfactuals, our method results in higher visual and semantic fidelity than state-of-the-art alternatives, while maintaining the performance of models trained using only real data on non-human-centric tasks. Models fine-tuned with our counterfactuals demonstrate measurable bias reduction across multiple metrics, including a decrease in gender…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOpinion Dynamics and Social Influence
MethodsCounterfactuals Explanations
