DeEscalWild: A Real-World Benchmark for Automated De-Escalation Training with SLMs

Md Hasebul Hasan; Krity Haque Charu; Eshwara Prasad Sridhar; Shuchisnigdha Deb; Mohammad A. Islam

arXiv:2604.13075·cs.CL·May 8, 2026

DeEscalWild: A Real-World Benchmark for Automated De-Escalation Training with SLMs

Md Hasebul Hasan, Krity Haque Charu, Eshwara Prasad Sridhar, Shuchisnigdha Deb, Mohammad A. Islam

PDF

1 Repo

TL;DR

DeEscalWild introduces a high-quality, real-world benchmark dataset for training small language models to improve automated de-escalation training for law enforcement, emphasizing scalability and realism.

Contribution

The paper presents a novel dataset and benchmark for fine-tuning SLMs on police-civilian interactions, demonstrating superior performance over base models in de-escalation tasks.

Findings

01

SLMs fine-tuned on DeEscalWild outperform base models across multiple metrics.

02

Qwen 2.5 (3B-Instruct) surpasses Gemini 2.5 Flash in domain-specific evaluation.

03

The dataset enables development of low-latency, privacy-preserving officer training systems.

Abstract

Effective de-escalation is critical for law enforcement safety and community trust, yet traditional training methods lack scalability and realism. While Large Language Models (LLMs) enable dynamic, open-ended simulations, their substantial computational footprint renders them impractical for deployment on the lightweight, portable hardware required for immersive field training. Small Language Models (SLMs) offer a viable real-time alternative but suffer from a critical scarcity of high-quality, domain-specific training data. To bridge this gap, we present DeEscalWild, a novel benchmark dataset curated from a multi-stage pipeline of in-the-wild police-civilian interactions extracted from publicly available video repositories. Starting with 5,000 raw inputs, we employed a rigorous hybrid filtering process combining human-in-the-loop verification with LLM-as-a-Judge evaluation to distill…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Hasebul/DeEscalWild-Benchmark-Framework
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.