Fast Model Debias with Machine Unlearning
Ruizhe Chen, Jianfei Yang, Huimin Xiong, Jianhong Bai, Tianxiang Hu,, Jin Hao, Yang Feng, Joey Tianyi Zhou, Jian Wu, Zuozhu Liu

TL;DR
This paper introduces a fast, efficient framework for debiasing trained neural network models using machine unlearning, which reduces biases with minimal data and computational cost, improving fairness without retraining from scratch.
Contribution
The proposed FMD framework uniquely combines bias identification via counterfactual concepts and influence functions with machine unlearning to effectively remove biases from trained models.
Findings
Achieves comparable or better accuracy than state-of-the-art methods.
Reduces biases significantly with less data and computational effort.
Works effectively on large language models and various datasets.
Abstract
Recent discoveries have revealed that deep neural networks might behave in a biased manner in many real-world scenarios. For instance, deep networks trained on a large-scale face recognition dataset CelebA tend to predict blonde hair for females and black hair for males. Such biases not only jeopardize the robustness of models but also perpetuate and amplify social biases, which is especially concerning for automated decision-making processes in healthcare, recruitment, etc., as they could exacerbate unfair economic and social inequalities among different groups. Existing debiasing methods suffer from high costs in bias labeling or model re-training, while also exhibiting a deficiency in terms of elucidating the origins of biases within the model. To this respect, we propose a fast model debiasing framework (FMD) which offers an efficient approach to identify, evaluate and remove biases…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsEthics and Social Impacts of AI
