From Rogue to Safe AI: The Role of Explicit Refusals in Aligning LLMs with International Humanitarian Law

John Mavi; Diana Teodora G\u{a}itan; Sergio Coronado

arXiv:2506.06391·cs.CY·June 10, 2025

From Rogue to Safe AI: The Role of Explicit Refusals in Aligning LLMs with International Humanitarian Law

John Mavi, Diana Teodora G\u{a}itan, Sergio Coronado

PDF

Open Access

TL;DR

This paper assesses how well large language models can refuse prompts that violate International Humanitarian Law, emphasizing the importance of clear, explanatory refusals to enhance AI safety and transparency.

Contribution

It introduces a benchmark for evaluating LLMs' compliance with IHL and demonstrates the effectiveness of system-level safety prompts in improving refusal quality.

Findings

01

Most models rejected unlawful requests

02

Safety prompts improved explanation quality

03

Vulnerabilities remain with technical language requests

Abstract

Large Language Models (LLMs) are widely used across sectors, yet their alignment with International Humanitarian Law (IHL) is not well understood. This study evaluates eight leading LLMs on their ability to refuse prompts that explicitly violate these legal frameworks, focusing also on helpfulness - how clearly and constructively refusals are communicated. While most models rejected unlawful requests, the clarity and consistency of their responses varied. By revealing the model's rationale and referencing relevant legal or safety principles, explanatory refusals clarify the system's boundaries, reduce ambiguity, and help prevent misuse. A standardised system-level safety prompt significantly improved the quality of the explanations expressed within refusals in most models, highlighting the effectiveness of lightweight interventions. However, more complex prompts involving technical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI · Artificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI)