Causal Bias Detection in Generative Artificial Intelligence

Drago Plecko

arXiv:2605.11365·cs.AI·May 19, 2026

Causal Bias Detection in Generative Artificial Intelligence

Drago Plecko

PDF

TL;DR

This paper develops a causal fairness framework tailored for generative AI models, enabling detailed analysis of bias pathways and mechanisms, with practical estimators demonstrated on language models.

Contribution

It introduces a novel causal fairness methodology specifically designed for generative AI, unifying it with standard ML approaches and providing tools for bias quantification.

Findings

01

New causal decomposition results for fairness impacts

02

Identification conditions and estimators for causal bias measures

03

Analysis of race and gender bias in large language models

Abstract

Automated systems built on artificial intelligence (AI) are increasingly deployed across high-stakes domains, raising critical concerns about fairness and the perpetuation of demographic disparities that exist in the world. In this context, causal inference provides a principled framework for reasoning about fairness, as it links observed disparities to underlying mechanisms and aligns naturally with human intuition and legal notions of discrimination. Prior work on causal fairness primarily focuses on the standard machine learning setting, where a decision-maker constructs a single predictive mechanism $f_{Y}$ for an outcome variable $Y$ , while inheriting the causal mechanisms of all other covariates from the real world. The generative AI setting, however, is markedly more complex: generative models can sample from arbitrary conditionals over any set of variables, implicitly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.