The Geopolitics of AI Safety: A Causal Analysis of Regional LLM Bias

Alif Al Hasan

arXiv:2605.05427·cs.AI·May 8, 2026

The Geopolitics of AI Safety: A Causal Analysis of Regional LLM Bias

Alif Al Hasan

PDF

TL;DR

This paper introduces a causal framework using Probabilistic Graphical Models to analyze regional biases in Large Language Models, revealing disparities between observational and causal bias assessments across diverse models.

Contribution

It presents a novel causal analysis method for LLM bias evaluation, highlighting limitations of traditional fairness metrics and uncovering regional bias trends.

Findings

01

Western models show higher causal refusal rates for certain demographics

02

Eastern models have low intervention rates but regional sensitivities

03

Standard fairness metrics may overestimate demographic bias

Abstract

As Large Language Models (LLMs) are integrated into global software systems, ensuring equitable safety guardrails is a critical requirement. Current fairness evaluations predominantly measure bias observationally, a methodology confounded by the inherent toxicity of topics naturally paired with specific demographics in testing datasets. This study introduces a Probabilistic Graphical Model (PGM) framework to audit LLM safety mechanisms causally. By applying Pearl's do-operator, we mathematically isolate the causal effect of injecting a cultural demographic into a prompt. We conduct a large-scale empirical analysis across seven instruction-tuned models spanning diverse origins: the United States (Llama-3.1-8B, Gemma-2-9B), Europe (Mistral-7B-v0.3), the UAE (Falcon3-7B), China (Qwen2.5-7B, DeepSeek-7B), and India (Airavata-7B). Utilizing two distinct datasets (ToxiGen and BOLD), the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.