A Unified Debiasing Approach for Vision-Language Models across   Modalities and Tasks

Hoin Jung; Taeuk Jang; Xiaoqian Wang

arXiv:2410.07593·cs.CV·October 30, 2024

A Unified Debiasing Approach for Vision-Language Models across Modalities and Tasks

Hoin Jung, Taeuk Jang, Xiaoqian Wang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper presents SFID, a versatile debiasing method for Vision-Language Models that reduces societal biases across multiple tasks without retraining, maintaining performance and enhancing fairness.

Contribution

Introduces SFID, a novel debiasing technique combining feature pruning and low confidence imputation, applicable across various VLM tasks without retraining.

Findings

01

Significantly reduces gender biases in VLMs

02

Maintains model performance across tasks

03

Applicable to multiple VLM architectures

Abstract

Recent advancements in Vision-Language Models (VLMs) have enabled complex multimodal tasks by processing text and image data simultaneously, significantly enhancing the field of artificial intelligence. However, these models often exhibit biases that can skew outputs towards societal stereotypes, thus necessitating debiasing strategies. Existing debiasing methods focus narrowly on specific modalities or tasks, and require extensive retraining. To address these limitations, this paper introduces Selective Feature Imputation for Debiasing (SFID), a novel methodology that integrates feature pruning and low confidence imputation (LCI) to effectively reduce biases in VLMs. SFID is versatile, maintaining the semantic integrity of outputs and costly effective by eliminating the need for retraining. Our experimental results demonstrate SFID's effectiveness across various VLMs tasks including…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

HoinJung/Unified-Debiaisng-VLM-SFID
jaxOfficial

Videos

A Unified Debiasing Approach for Vision-Language Models across Modalities and Tasks· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications

MethodsPruning · Focus