Evaluating LLM Safety Under Repeated Inference via Accelerated Prompt Stress Testing

Keita Broadwater

arXiv:2602.11786·cs.LG·April 29, 2026

Evaluating LLM Safety Under Repeated Inference via Accelerated Prompt Stress Testing

Keita Broadwater

PDF

TL;DR

This paper introduces Accelerated Prompt Stress Testing (APST), a new framework for evaluating LLM safety under repeated inference, revealing failure modes not captured by traditional benchmarks.

Contribution

APST offers a depth-oriented, stochastic evaluation method for assessing LLM safety and reliability during sustained use, complementing existing benchmarks.

Findings

01

Models with similar shallow scores can have different failure rates under repeated inference.

02

APST uncovers latent failure modes like hallucinations and unsafe completions.

03

Repeated sampling reveals reliability differences not seen in single-sample evaluations.

Abstract

Traditional benchmarks for large language models (LLMs), such as HELM and AIR-BENCH, primarily assess safety through breadth-oriented evaluation across diverse tasks and risk categories. However, real-world deployment often exposes a different class of risk: operational failures that arise under repeated inference on identical or near-identical prompts rather than from broad task-level underperformance. In high-stakes settings, response consistency and safety under sustained use are therefore critical. We introduce Accelerated Prompt Stress Testing (APST), a depth-oriented evaluation framework inspired by highly accelerated stress testing in reliability engineering. APST repeatedly samples identical prompts under controlled operational conditions (such as decoding temperature) to surface latent failure modes including hallucinations, refusal inconsistency, and unsafe completions. Rather…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.