Do Stop Me Now: Detecting Boilerplate Responses with a Single Iteration
Yuval Kainan, Shaked Zychlinski

TL;DR
This paper introduces a simple method to detect boilerplate responses in large language models after just one generation step, enabling early termination and reducing computational costs.
Contribution
The paper proposes using the first token's log-probability distribution as a signal to classify responses, demonstrating high accuracy across various models with minimal computation.
Findings
High accuracy in detecting boilerplate responses using first-token log-probabilities
Distinct clustering of response types across different models
Significant computational savings through early response classification
Abstract
Large Language Models (LLMs) often expend significant computational resources generating boilerplate responses, such as refusals, simple acknowledgements and casual greetings, which adds unnecessary cost and latency. To address this inefficiency, we propose a simple yet highly effective method for detecting such responses after only a single generation step. We demonstrate that the log-probability distribution of the first generated token serves as a powerful signal for classifying the nature of the entire subsequent response. Our experiments, conducted across a diverse range of small, large, and reasoning-specialized models, show that the first-token log-probability vectors form distinctly separable clusters for different response types. Using a lightweight k-NN classifier, we achieve high accuracy in predicting whether a response will be a substantive answer or a form of boilerplate…
Peer Reviews
Decision·Submitted to ICLR 2026
- Direct Approach. Reading a single first-token log-probability vector and classifying with k-NN delivers useful discrimination among boilerplate types. - Clear Motivation. Framing the work around early stopping/routing aligns with real-world needs. - Experimental Transparency. The paper describes data construction, fixed k in k-NN, and cross-validation, which helps readers reproduce the general setup.
- Adversarial Mixed Intents and False Positive. The dataset design around hello, refusal and thanks is reasonable, but it overlooks adversarial or mixed-intent prompts that can blur class boundaries. For example, a user input like “Hello, nice to meet you. How’s the weather today?” will likely elicit a first token such as “Hello,” followed only then by substantive content about weather. A first-token classifier is severely stressed in such cases, and the paper does not analyze this reliability g
The proposed method is extremely simple and computationally lightweight, as it only requires a single forward pass to get the first token's probabilities and a fast k-NN lookup. It uses the entire log-probability vector with a k-NN classifier rather than a manually selected subset of tokens, and extends this classification from just "Refusal" to also include "Thanks" and "Hello"
The primary weakness of this paper is its limited novelty and contribution. The core idea that the first token's probabilities can predict the subsequent response, especially for refusals, is not new. The authors themselves cite related work (Arditi et al., 2024) which already derived a "refusal metric" by summing probabilities of "refusal tokens" at the first token position. The reliance on a k-NN classifier is sensitive to the training data. It's unclear how this approach would generalize to
* Addresses a practical problem of reducing inference costs for predictable responses
* **Fundamentally unsound task definition**: The paper groups refusals, greetings, and acknowledgements together as "boilerplate responses" that can be handled uniformly. This is deeply problematic. Refusals are safety-critical responses that embody the model's alignment training, not "boilerplate waste" to be optimized away. Replacing careful safety mechanisms with a k-NN classifier trained on 3k synthetic examples is inappropriate and potentially dangerous. The proposed solution of routing ref
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
