Language-guided Detection and Mitigation of Unknown Dataset Bias

Zaiying Zhao; Soichiro Kumano; Toshihiko Yamasaki

arXiv:2406.02889·cs.CV·June 6, 2024

Language-guided Detection and Mitigation of Unknown Dataset Bias

Zaiying Zhao, Soichiro Kumano, Toshihiko Yamasaki

PDF

Open Access

TL;DR

This paper introduces a framework for detecting and mitigating dataset bias in classifiers without prior bias knowledge, using caption analysis and data augmentation, leading to improved fairness and performance.

Contribution

It presents a novel bias detection method based on caption keywords and two debiasing techniques, outperforming prior approaches that require prior bias knowledge.

Findings

01

Outperforms existing methods without prior bias knowledge

02

Achieves comparable results to methods with prior bias information

03

Enhances classifier fairness and accuracy on biased datasets

Abstract

Dataset bias is a significant problem in training fair classifiers. When attributes unrelated to classification exhibit strong biases towards certain classes, classifiers trained on such dataset may overfit to these bias attributes, substantially reducing the accuracy for minority groups. Mitigation techniques can be categorized according to the availability of bias information (\ie, prior knowledge). Although scenarios with unknown biases are better suited for real-world settings, previous work in this field often suffers from a lack of interpretability regarding biases and lower performance. In this study, we propose a framework to identify potential biases as keywords without prior knowledge based on the partial occurrence in the captions. We further propose two debiasing methods: (a) handing over to an existing debiasing approach which requires prior knowledge by assigning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Anomaly Detection Techniques and Applications