Learning Counterfactually Fair Models via Improved Generation with Neural Causal Models
Krishn Vishwas Kher, Saksham Mittal, Aditya Varun V, Shantanu Das, SakethaNath Jagarlapudi

TL;DR
This paper introduces a novel approach for learning counterfactually fair models by leveraging neural causal models for better counterfactual sample generation and proposing explicit regularizers to enforce fairness conditions, improving fairness and generalization.
Contribution
It proposes using Neural Causal Models with a kernel least squares loss and an MMD-based regularizer to directly enforce counterfactual fairness during model training.
Findings
Improved counterfactual sample generation fidelity.
Enhanced trade-off between fairness and generalization.
Better performance on synthetic and benchmark datasets.
Abstract
One of the main concerns while deploying machine learning models in real-world applications is fairness. Counterfactual fairness has emerged as an intuitive and natural definition of fairness. However, existing methodologies for enforcing counterfactual fairness seem to have two limitations: (i) generating counterfactual samples faithful to the underlying causal graph, and (ii) as we argue in this paper, existing regularizers are mere proxies and do not directly enforce the exact definition of counterfactual fairness. In this work, our aim is to mitigate both issues. Firstly, we propose employing Neural Causal Models (NCMs) for generating the counterfactual samples. For implementing the abduction step in NCMs, the posteriors of the exogenous variables need to be estimated given a counterfactual query, as they are not readily available. As a consequence, consistency with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Adversarial Robustness in Machine Learning
