Mitigating Bias in Concept Bottleneck Models for Fair and Interpretable Image Classification

Schrasing Tong; Antoine Salaun; Vincent Yuan; Annabel Adeyeri; Lalana Kagal

arXiv:2603.05899·cs.CV·March 9, 2026

Mitigating Bias in Concept Bottleneck Models for Fair and Interpretable Image Classification

Schrasing Tong, Antoine Salaun, Vincent Yuan, Annabel Adeyeri, Lalana Kagal

PDF

Open Access

TL;DR

This paper introduces three techniques to reduce bias in concept bottleneck models for image classification, significantly improving fairness while maintaining interpretability.

Contribution

It proposes novel bias mitigation methods for CBMs, including concept filtering, removal, and adversarial training, enhancing fairness without sacrificing accuracy.

Findings

01

Outperforms prior methods in fairness-performance tradeoffs

02

Reduces gender bias on datasets like ImSitu

03

Improves interpretability and fairness in image classification

Abstract

Ensuring fairness in image classification prevents models from perpetuating and amplifying bias. Concept bottleneck models (CBMs) map images to high-level, human-interpretable concepts before making predictions via a sparse, one-layer classifier. This structure enhances interpretability and, in theory, supports fairness by masking sensitive attribute proxies such as facial features. However, CBM concepts have been known to leak information unrelated to concept semantics and early results reveal only marginal reductions in gender bias on datasets like ImSitu. We propose three bias mitigation techniques to improve fairness in CBMs: 1. Decreasing information leakage using a top-k concept filter, 2. Removing biased concepts, and 3. Adversarial debiasing. Our results outperform prior work in terms of fairness-performance tradeoffs, indicating that our debiased CBM provides a significant step…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Ethics and Social Impacts of AI