Locating Demographic Bias at the Attention-Head Level in CLIP's Vision Encoder

Alaa Yasser; Kittipat Phunjanna; Marcos Escudero Vi\~nolo; Catarina Barata; Jenny Benois-Pineau

arXiv:2603.11793·cs.CV·March 13, 2026

Locating Demographic Bias at the Attention-Head Level in CLIP's Vision Encoder

Alaa Yasser, Kittipat Phunjanna, Marcos Escudero Vi\~nolo, Catarina Barata, Jenny Benois-Pineau

PDF

Open Access

TL;DR

This paper introduces a method to locate demographic bias within individual attention heads of CLIP's vision transformer, revealing that bias can be localized and mitigated at the head level, with varying effectiveness across attributes.

Contribution

It presents a novel mechanistic fairness audit combining multiple techniques to identify specific attention heads responsible for demographic bias in vision transformers.

Findings

01

Identified attention heads whose ablation reduces gender bias and improves accuracy.

02

Demonstrated that bias localization is feasible at the head level in vision transformers.

03

Found that age bias is more diffusely encoded than gender bias in CLIP.

Abstract

Standard fairness audits of foundation models quantify that a model is biased, but not where inside the network the bias resides. We propose a mechanistic fairness audit that combines projected residual-stream decomposition, zero-shot Concept Activation Vectors, and bias-augmented TextSpan analysis to locate demographic bias at the level of individual attention heads in vision transformers. As a feasibility case study, we apply this pipeline to the CLIP ViT-L-14 encoder on 42 profession classes of the FACET benchmark, auditing both gender and age bias. For gender, the pipeline identifies four terminal-layer heads whose ablation reduces global bias (Cramer's V: 0.381 -> 0.362) while marginally improving accuracy (+0.42%); a layer-matched random control confirms that this effect is specific to the identified heads. A single head in the final layer contributes to the majority of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI · Explainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education