Causal Bias Detection in Generative Artificial Intelligence
Drago Plecko

TL;DR
This paper develops a causal fairness framework tailored for generative AI models, enabling detailed analysis of bias pathways and mechanisms, with practical estimators demonstrated on language models.
Contribution
It introduces a novel causal fairness methodology specifically designed for generative AI, unifying it with standard ML approaches and providing tools for bias quantification.
Findings
New causal decomposition results for fairness impacts
Identification conditions and estimators for causal bias measures
Analysis of race and gender bias in large language models
Abstract
Automated systems built on artificial intelligence (AI) are increasingly deployed across high-stakes domains, raising critical concerns about fairness and the perpetuation of demographic disparities that exist in the world. In this context, causal inference provides a principled framework for reasoning about fairness, as it links observed disparities to underlying mechanisms and aligns naturally with human intuition and legal notions of discrimination. Prior work on causal fairness primarily focuses on the standard machine learning setting, where a decision-maker constructs a single predictive mechanism for an outcome variable , while inheriting the causal mechanisms of all other covariates from the real world. The generative AI setting, however, is markedly more complex: generative models can sample from arbitrary conditionals over any set of variables, implicitly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
