D3G: Diverse Demographic Data Generation Increases Zero-Shot Image Classification Accuracy within Multimodal Models
Javon Hickmon

TL;DR
This paper introduces D3G, a training-free method that enhances zero-shot image classification accuracy and reduces demographic bias in multimodal models by generating diverse demographic data at inference time.
Contribution
The paper presents D3G, a novel approach that uses generative models to produce diverse demographic data, improving accuracy and fairness in zero-shot classification without additional training.
Findings
D3G improves zero-shot classification accuracy across multiple demographics.
Generating diverse demographic data reduces bias in model predictions.
The method is effective with models like CLIP and Stable Diffusion XL.
Abstract
Image classification is a task essential for machine perception to achieve human-level image understanding. Multimodal models such as CLIP have been able to perform well on this task by learning semantic similarities across vision and language; however, despite these advances, image classification is still a challenging task. Models with low capacity often suffer from underfitting and thus underperform on fine-grained image classification. Along with this, it is important to ensure high-quality data with rich cross-modal representations of each class, which is often difficult to generate. When datasets do not enforce balanced demographics, the predictions will be biased toward the more represented class, while others will be neglected. We focus on how these issues can lead to harmful bias for zero-shot image classification, and explore how to combat these issues in demographic bias. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Face recognition and analysis · Generative Adversarial Networks and Image Synthesis
