Discover the Unknown Biased Attribute of an Image Classifier
Zhiheng Li, Chenliang Xu

TL;DR
This paper introduces a novel method to automatically discover unknown biased attributes in image classifiers by optimizing hyperplanes in generative model latent spaces, reducing reliance on human conjecture.
Contribution
It proposes a new framework using hyperplanes, total-variation loss, and orthogonalization constraints to identify hidden biases in classifiers without prior bias knowledge.
Findings
Successfully discovers unnoticeable biased attributes in various classifiers.
Achieves better disentanglement of target and biased attributes.
Demonstrates generalizability across diverse image domains.
Abstract
Recent works find that AI algorithms learn biases from data. Therefore, it is urgent and vital to identify biases in AI algorithms. However, the previous bias identification pipeline overly relies on human experts to conjecture potential biases (e.g., gender), which may neglect other underlying biases not realized by humans. To help human experts better find the AI algorithms' biases, we study a new problem in this work -- for a classifier that predicts a target attribute of the input image, discover its unknown biased attribute. To solve this challenging problem, we use a hyperplane in the generative model's latent space to represent an image attribute; thus, the original problem is transformed to optimizing the hyperplane's normal vector and offset. We propose a novel total-variation loss within this framework as the objective function and a new orthogonalization penalty as a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Machine Learning and Data Classification
