Evaluating Reliability Gaps in Large Language Model Safety via Repeated Prompt Sampling

Keita Broadwater

arXiv:2604.09606·cs.AI·April 14, 2026

Evaluating Reliability Gaps in Large Language Model Safety via Repeated Prompt Sampling

Keita Broadwater

PDF

TL;DR

This paper introduces APST, a new depth-oriented evaluation method for LLM safety that tests models under repeated prompts to reveal latent failure modes and operational risks.

Contribution

The paper proposes APST, a stress testing framework inspired by reliability engineering, to assess LLM safety under repeated use and quantify failure probabilities.

Findings

01

Repeated sampling uncovers variability in failure rates across models and temperatures.

02

Shallow benchmarks may hide significant reliability differences in sustained use.

03

APST effectively surfaces latent safety failure modes in instruction-tuned LLMs.

Abstract

Traditional benchmarks for large language models (LLMs), such as HELM and AIR-BENCH, primarily assess safety risk through breadth-oriented evaluation across diverse tasks. However, real-world deployment often exposes a different class of risk: operational failures arising from repeated generations of the same prompt rather than broad task generalization. In high-stakes settings, response consistency and safety under repeated use are critical operational requirements. We introduce Accelerated Prompt Stress Testing (APST), a depth-oriented evaluation framework inspired by highly accelerated stress testing in reliability engineering. APST probes LLM behavior by repeatedly sampling identical prompts under controlled operational conditions, including temperature variation and prompt perturbation, to surface latent failure modes such as hallucinations, refusal inconsistency, and unsafe…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.