Measuring and Eliminating Refusals in Military Large Language Models

Jack FitzGerald; Dylan Bates; Aristotelis Lazaridis; Aman Sharma; Vincent Lu; Brian King; Yousif Azami; Sean Bailey; Jeremy Cao; Peter Damianov; Kevin de Haan; Joseph Madigan; Jeremy McLaurin; Luke Kerbs; Jonathan Tainer; Dave Anderson; Jonathan Beck; Jamie Cuticello; Colton Malkerson; Tyler Saltsman

arXiv:2603.10012·cs.CL·March 12, 2026

Measuring and Eliminating Refusals in Military Large Language Models

Jack FitzGerald, Dylan Bates, Aristotelis Lazaridis, Aman Sharma, Vincent Lu, Brian King, Yousif Azami, Sean Bailey, Jeremy Cao, Peter Damianov, Kevin de Haan, Joseph Madigan, Jeremy McLaurin, Luke Kerbs, Jonathan Tainer, Dave Anderson, Jonathan Beck, Jamie Cuticello

PDF

Open Access

TL;DR

This paper introduces a benchmark dataset to measure refusal rates in military large language models and explores methods to reduce refusals, aiming to improve their utility in time-critical military scenarios.

Contribution

It presents the first dataset for assessing refusal rates in military LLMs and evaluates techniques to significantly reduce refusals while maintaining task accuracy.

Findings

01

Refusal rates as high as 98.2% in some models

02

Synthetic datasets correlate with the gold refusal dataset

03

Ablation increased answer rate by 66.5 points

Abstract

Military Large Language Models (LLMs) must provide accurate information to the warfighter in time-critical and dangerous situations. However, today's LLMs are imbued with safety behaviors that cause the LLM to refuse many legitimate queries in the military domain, particularly those related to violence, terrorism, or military technology. Our gold benchmark for assessing refusal rates, which was developed by veterans of the US Army and special forces, is to our knowledge the first dataset of its kind. We present results for refusal and deflection rates on 31 public models and 3 military models. We observe hard rejection rates as high as 98.2% and soft deflection rates ranging from 0% to 21.3%. We also present results on two additional synthetic datasets and show their correlations with the gold dataset. Finally, we perform abliteration using the Heretic library on a military-tuned…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Ethics and Social Impacts of AI