TL;DR
FairLLaVA is a novel, efficient fine-tuning approach for large vision-language models that reduces demographic biases and improves fairness without sacrificing overall performance, especially in medical imaging tasks.
Contribution
It introduces a parameter-efficient, architecture-agnostic method that minimizes demographic information in model representations to mitigate biases in multimodal models.
Findings
Reduces inter-group disparities in medical image report generation and visual question answering.
Improves fairness and clinical performance across diverse medical imaging modalities.
Maintains high-quality natural language generation while enhancing equity.
Abstract
While powerful in image-conditioned generation, multimodal large language models (MLLMs) can display uneven performance across demographic groups, highlighting fairness risks. In safety-critical clinical settings, such disparities risk producing unequal diagnostic narratives and eroding trust in AI-assisted decision-making. While fairness has been studied extensively in vision-only and language-only models, its impact on MLLMs remains largely underexplored. To address these biases, we introduce FairLLaVA, a parameter-efficient fine-tuning method that mitigates group disparities in visual instruction tuning without compromising overall performance. By minimizing the mutual information between target attributes, FairLLaVA regularizes the model's representations to be demographic-invariant. The method can be incorporated as a lightweight plug-in, maintaining efficiency with low-rank…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
