Locating Demographic Bias at the Attention-Head Level in CLIP's Vision Encoder
Alaa Yasser, Kittipat Phunjanna, Marcos Escudero Vi\~nolo, Catarina Barata, Jenny Benois-Pineau

TL;DR
This paper introduces a method to locate demographic bias within individual attention heads of CLIP's vision transformer, revealing that bias can be localized and mitigated at the head level, with varying effectiveness across attributes.
Contribution
It presents a novel mechanistic fairness audit combining multiple techniques to identify specific attention heads responsible for demographic bias in vision transformers.
Findings
Identified attention heads whose ablation reduces gender bias and improves accuracy.
Demonstrated that bias localization is feasible at the head level in vision transformers.
Found that age bias is more diffusely encoded than gender bias in CLIP.
Abstract
Standard fairness audits of foundation models quantify that a model is biased, but not where inside the network the bias resides. We propose a mechanistic fairness audit that combines projected residual-stream decomposition, zero-shot Concept Activation Vectors, and bias-augmented TextSpan analysis to locate demographic bias at the level of individual attention heads in vision transformers. As a feasibility case study, we apply this pipeline to the CLIP ViT-L-14 encoder on 42 profession classes of the FACET benchmark, auditing both gender and age bias. For gender, the pipeline identifies four terminal-layer heads whose ablation reduces global bias (Cramer's V: 0.381 -> 0.362) while marginally improving accuracy (+0.42%); a layer-matched random control confirms that this effect is specific to the identified heads. A single head in the final layer contributes to the majority of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Explainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education
