D3G: Diverse Demographic Data Generation Increases Zero-Shot Image Classification Accuracy within Multimodal Models

Javon Hickmon

arXiv:2512.15747·cs.LG·December 19, 2025

D3G: Diverse Demographic Data Generation Increases Zero-Shot Image Classification Accuracy within Multimodal Models

Javon Hickmon

PDF

Open Access

TL;DR

This paper introduces D3G, a training-free method that enhances zero-shot image classification accuracy and reduces demographic bias in multimodal models by generating diverse demographic data at inference time.

Contribution

The paper presents D3G, a novel approach that uses generative models to produce diverse demographic data, improving accuracy and fairness in zero-shot classification without additional training.

Findings

01

D3G improves zero-shot classification accuracy across multiple demographics.

02

Generating diverse demographic data reduces bias in model predictions.

03

The method is effective with models like CLIP and Stable Diffusion XL.

Abstract

Image classification is a task essential for machine perception to achieve human-level image understanding. Multimodal models such as CLIP have been able to perform well on this task by learning semantic similarities across vision and language; however, despite these advances, image classification is still a challenging task. Models with low capacity often suffer from underfitting and thus underperform on fine-grained image classification. Along with this, it is important to ensure high-quality data with rich cross-modal representations of each class, which is often difficult to generate. When datasets do not enforce balanced demographics, the predictions will be biased toward the more represented class, while others will be neglected. We focus on how these issues can lead to harmful bias for zero-shot image classification, and explore how to combat these issues in demographic bias. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Face recognition and analysis · Generative Adversarial Networks and Image Synthesis