IndicFairFace: Balanced Indian Face Dataset for Auditing and Mitigating Geographical Bias in Vision-Language Models
Aarish Shah Mohsin, Mohammed Tayyab Ilyas Khan, Mohammad Nadeem, Shahab Saquib Sohail, Erik Cambria, Jiechao Gao

TL;DR
This paper introduces IndicFairFace, a balanced Indian face dataset capturing intra-national diversity, to evaluate and reduce geographical bias in vision-language models, ensuring fairer representation of India’s regions.
Contribution
The paper presents IndicFairFace, the first dataset focusing on intra-national Indian diversity, and demonstrates a debiasing method that minimally impacts model accuracy.
Findings
IndicFairFace effectively captures geographical diversity within India.
Debiasing reduces geographical bias with less than 1.5% accuracy drop.
IndicFairFace serves as a benchmark for Indian geographical bias in VLMs.
Abstract
Vision-Language Models (VLMs) are known to inherit and amplify societal biases from their web-scale training data with Indian being particularly misrepresented. Existing fairness-aware datasets have significantly improved demographic balance across global race and gender groups, yet they continue to treat Indian as a single monolithic category. The oversimplification ignores the vast intra-national diversity across 28 states and 8 Union Territories of India and leads to representational and geographical bias. To address the limitation, we present IndicFairFace, a novel and balanced face dataset comprising 14,400 images representing geographical diversity of India. Images were sourced ethically from Wikimedia Commons and open-license web repositories and uniformly balanced across states and gender. Using IndicFairFace, we quantify intra-national geographical bias in prominent CLIP-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Face recognition and analysis · Domain Adaptation and Few-Shot Learning
