Dynamic Evaluation for Oversensitivity in LLMs

Sophia Xiao Pu; Sitao Cheng; Xin Eric Wang; William Yang Wang

arXiv:2510.19005·cs.CL·October 23, 2025

Dynamic Evaluation for Oversensitivity in LLMs

Sophia Xiao Pu, Sitao Cheng, Xin Eric Wang, William Yang Wang

PDF

Open Access 1 Video

TL;DR

This paper introduces a dynamic benchmarking framework and OVERBENCH, a large-scale, evolving dataset collection that assesses oversensitivity in language models, addressing limitations of static benchmarks and capturing emerging defensive behaviors.

Contribution

The paper presents a novel dynamic evaluation framework and OVERBENCH, the first large-scale, evolving benchmark for oversensitivity in LLMs, tailored to model-specific behaviors.

Findings

01

OVERBENCH contains 450,000 samples from 25 models.

02

Dynamic datasets reveal vulnerabilities missed by static benchmarks.

03

Framework enables continuous monitoring of model oversensitivity.

Abstract

Oversensitivity occurs when language models defensively reject prompts that are actually benign. This behavior not only disrupts user interactions but also obscures the boundary between harmful and harmless content. Existing benchmarks rely on static datasets that degrade overtime as models evolve, leading to data contamination and diminished evaluative power. To address this, we develop a framework that dynamically generates model-specific challenging datasets, capturing emerging defensive patterns and aligning with each model's unique behavior. Building on this approach, we construct OVERBENCH, a benchmark that aggregates these datasets across diverse LLM families, encompassing 450,000 samples from 25 models. OVERBENCH provides a dynamic and evolving perspective on oversensitivity, allowing for continuous monitoring of defensive triggers as models advance, highlighting vulnerabilities…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Dynamic Evaluation for Oversensitivity in LLMs· underline

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Hate Speech and Cyberbullying Detection · Topic Modeling