IndicFairFace: Balanced Indian Face Dataset for Auditing and Mitigating Geographical Bias in Vision-Language Models

Aarish Shah Mohsin; Mohammed Tayyab Ilyas Khan; Mohammad Nadeem; Shahab Saquib Sohail; Erik Cambria; Jiechao Gao

arXiv:2602.12659·cs.CV·February 16, 2026

IndicFairFace: Balanced Indian Face Dataset for Auditing and Mitigating Geographical Bias in Vision-Language Models

Aarish Shah Mohsin, Mohammed Tayyab Ilyas Khan, Mohammad Nadeem, Shahab Saquib Sohail, Erik Cambria, Jiechao Gao

PDF

Open Access

TL;DR

This paper introduces IndicFairFace, a balanced Indian face dataset capturing intra-national diversity, to evaluate and reduce geographical bias in vision-language models, ensuring fairer representation of India’s regions.

Contribution

The paper presents IndicFairFace, the first dataset focusing on intra-national Indian diversity, and demonstrates a debiasing method that minimally impacts model accuracy.

Findings

01

IndicFairFace effectively captures geographical diversity within India.

02

Debiasing reduces geographical bias with less than 1.5% accuracy drop.

03

IndicFairFace serves as a benchmark for Indian geographical bias in VLMs.

Abstract

Vision-Language Models (VLMs) are known to inherit and amplify societal biases from their web-scale training data with Indian being particularly misrepresented. Existing fairness-aware datasets have significantly improved demographic balance across global race and gender groups, yet they continue to treat Indian as a single monolithic category. The oversimplification ignores the vast intra-national diversity across 28 states and 8 Union Territories of India and leads to representational and geographical bias. To address the limitation, we present IndicFairFace, a novel and balanced face dataset comprising 14,400 images representing geographical diversity of India. Images were sourced ethically from Wikimedia Commons and open-license web repositories and uniformly balanced across states and gender. Using IndicFairFace, we quantify intra-national geographical bias in prominent CLIP-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Face recognition and analysis · Domain Adaptation and Few-Shot Learning