Are LLMs Ready for Conflict Monitoring? Empirical Evidence from West Africa
Hoffmann Muki, Olukunle Owolabi

TL;DR
This study evaluates the biases and robustness of various open-weight and domain-adapted LLMs in conflict event classification in West Africa, revealing significant normative and actor biases and highlighting the need for fairness and robustness improvements.
Contribution
The paper provides empirical evidence on the normative and actor biases of LLMs in conflict monitoring and assesses the impact of domain adaptation on these biases.
Findings
Open-weight models exhibit significant false illegitimation bias.
Domain-adapted models achieve near-neutrality but retain actor-based biases.
Models are fragile to geography-specific lexical framing and masking normative biases.
Abstract
As LLMs enter conflict monitoring, understanding systematic distortions in their outputs is critical for humanitarian accountability. We evaluate four vanilla open-weight models Gemma 3 4B, Llama 3.2 3B, Mistral 7B, and OLMo 2 7B and two domain-adapted models, AfroConfliBERT and AfroConfliLLAMA, on Nigeria and Cameroon conflict-event classification against ACLED, a gold-standard dataset with multi-stage verification. We find a bifurcated divergence in normative directionality. Open-weight models exhibit statistically significant False Illegitimation bias: Gemma misclassifies to 18.29% of legitimate battles as civilian-targeted violence while making zero False Legitimation errors. By contrast, AfroConfliBERT and AfroConfliLLAMA achieve near-directional neutrality, with Legitimization Bias differences indistinguishable from zero. Yet domain adaptation does not eliminate actor-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
