When Names Change Verdicts: Intervention Consistency Reveals Systematic Bias in LLM Decision-Making
Abhinaba Basu, Pavan Chakraborty

TL;DR
This paper introduces ICE-Guard, a framework for detecting and reducing systematic biases in large language models across high-stakes domains by intervention consistency testing and iterative prompt patching.
Contribution
The paper presents ICE-Guard, a novel method for identifying and mitigating bias in LLM decision-making through intervention consistency testing and structured decomposition.
Findings
Authority bias (5.8%) and framing bias (5.0%) exceed demographic bias (2.2%).
Bias varies significantly across domains, e.g., finance 22.6%, criminal justice 2.8%.
Structured decomposition reduces flip rates by up to 100%.
Abstract
Large language models (LLMs) are increasingly used for high-stakes decisions, yet their susceptibility to spurious features remains poorly characterized. We introduce ICE-Guard, a framework applying intervention consistency testing to detect three types of spurious feature reliance: demographic (name/race swaps), authority (credential/prestige swaps), and framing (positive/negative restatements). Across 3,000 vignettes spanning 10 high-stakes domains, we evaluate 11 LLMs from 8 families and find that (1) authority bias (mean 5.8%) and framing bias (5.0%) substantially exceed demographic bias (2.2%), challenging the field's narrow focus on demographics; (2) bias concentrates in specific domains -- finance shows 22.6% authority bias while criminal justice shows only 2.8%; (3) structured decomposition, where the LLM extracts features and a deterministic rubric decides, reduces flip rates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Computational and Text Analysis Methods · Ethics and Social Impacts of AI
