Uncovering Bias Paths with LLM-guided Causal Discovery: An Active Learning and Dynamic Scoring Approach
Khadija Zanna, Akane Sano

TL;DR
This paper introduces a hybrid causal discovery framework guided by large language models, combining active learning and dynamic scoring to better identify fairness-related causal pathways in noisy, confounded data.
Contribution
It proposes a novel LLM-guided causal discovery method with active learning and dynamic scoring, improving detection of fairness-critical paths in complex datasets.
Findings
LLM-guided methods outperform baselines in recovering fairness-relevant structure.
Active, dynamically scored approach enhances robustness under noise.
Framework effectively identifies bias pathways in semi-synthetic benchmarks.
Abstract
Ensuring fairness in machine learning requires understanding how sensitive attributes like race or gender causally influence outcomes. Existing causal discovery (CD) methods often struggle to recover fairness-relevant pathways in the presence of noise, confounding, or data corruption. Large language models (LLMs) offer a complementary signal by leveraging semantic priors from variable metadata. We propose a hybrid LLM-guided CD framework that extends a breadth-first search strategy with active learning and dynamic scoring. Variable pairs are prioritized for querying using a composite score combining mutual information, partial correlation, and LLM confidence, enabling more efficient and robust structure discovery. To evaluate fairness sensitivity, we introduce a semi-synthetic benchmark based on the UCI Adult dataset, embedding domain-informed bias pathways alongside noise and latent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSoftware Engineering Research
