Language-guided Detection and Mitigation of Unknown Dataset Bias
Zaiying Zhao, Soichiro Kumano, Toshihiko Yamasaki

TL;DR
This paper introduces a framework for detecting and mitigating dataset bias in classifiers without prior bias knowledge, using caption analysis and data augmentation, leading to improved fairness and performance.
Contribution
It presents a novel bias detection method based on caption keywords and two debiasing techniques, outperforming prior approaches that require prior bias knowledge.
Findings
Outperforms existing methods without prior bias knowledge
Achieves comparable results to methods with prior bias information
Enhances classifier fairness and accuracy on biased datasets
Abstract
Dataset bias is a significant problem in training fair classifiers. When attributes unrelated to classification exhibit strong biases towards certain classes, classifiers trained on such dataset may overfit to these bias attributes, substantially reducing the accuracy for minority groups. Mitigation techniques can be categorized according to the availability of bias information (\ie, prior knowledge). Although scenarios with unknown biases are better suited for real-world settings, previous work in this field often suffers from a lack of interpretability regarding biases and lower performance. In this study, we propose a framework to identify potential biases as keywords without prior knowledge based on the partial occurrence in the captions. We further propose two debiasing methods: (a) handing over to an existing debiasing approach which requires prior knowledge by assigning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Anomaly Detection Techniques and Applications
