Debiasing Convolutional Neural Networks via Meta Orthogonalization
Kurtis Evan David, Qiang Liu, Ruth Fong

TL;DR
This paper introduces Meta Orthogonalization, a method for reducing bias in CNNs by orthogonalizing concept representations, effectively mitigating bias while preserving task performance.
Contribution
The paper proposes a novel Meta Orthogonalization technique that disentangles concept representations in CNNs to reduce bias without sacrificing accuracy.
Findings
Significantly reduces model bias in CNNs.
Performs competitively against adversarial debiasing methods.
Maintains strong downstream task performance.
Abstract
While deep learning models often achieve strong task performance, their successes are hampered by their inability to disentangle spurious correlations from causative factors, such as when they use protected attributes (e.g., race, gender, etc.) to make decisions. In this work, we tackle the problem of debiasing convolutional neural networks (CNNs) in such instances. Building off of existing work on debiasing word embeddings and model interpretability, our Meta Orthogonalization method encourages the CNN representations of different concepts (e.g., gender and class labels) to be orthogonal to one another in activation space while maintaining strong downstream task performance. Through a variety of experiments, we systematically test our method and demonstrate that it significantly mitigates model bias and is competitive against current adversarial debiasing methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Topic Modeling
