Do Stop Me Now: Detecting Boilerplate Responses with a Single Iteration

Yuval Kainan; Shaked Zychlinski

arXiv:2510.22679·cs.AI·October 28, 2025

Do Stop Me Now: Detecting Boilerplate Responses with a Single Iteration

Yuval Kainan, Shaked Zychlinski

PDF

1 Datasets 3 Reviews

TL;DR

This paper introduces a simple method to detect boilerplate responses in large language models after just one generation step, enabling early termination and reducing computational costs.

Contribution

The paper proposes using the first token's log-probability distribution as a signal to classify responses, demonstrating high accuracy across various models with minimal computation.

Findings

01

High accuracy in detecting boilerplate responses using first-token log-probabilities

02

Distinct clustering of response types across different models

03

Significant computational savings through early response classification

Abstract

Large Language Models (LLMs) often expend significant computational resources generating boilerplate responses, such as refusals, simple acknowledgements and casual greetings, which adds unnecessary cost and latency. To address this inefficiency, we propose a simple yet highly effective method for detecting such responses after only a single generation step. We demonstrate that the log-probability distribution of the first generated token serves as a powerful signal for classifying the nature of the entire subsequent response. Our experiments, conducted across a diverse range of small, large, and reasoning-specialized models, show that the first-token log-probability vectors form distinctly separable clusters for different response types. Using a lightweight k-NN classifier, we achieve high accuracy in predicting whether a response will be a substantive answer or a form of boilerplate…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 4

Strengths

- Direct Approach. Reading a single first-token log-probability vector and classifying with k-NN delivers useful discrimination among boilerplate types. - Clear Motivation. Framing the work around early stopping/routing aligns with real-world needs. - Experimental Transparency. The paper describes data construction, fixed k in k-NN, and cross-validation, which helps readers reproduce the general setup.

Weaknesses

- Adversarial Mixed Intents and False Positive. The dataset design around hello, refusal and thanks is reasonable, but it overlooks adversarial or mixed-intent prompts that can blur class boundaries. For example, a user input like “Hello, nice to meet you. How’s the weather today?” will likely elicit a first token such as “Hello,” followed only then by substantive content about weather. A first-token classifier is severely stressed in such cases, and the paper does not analyze this reliability g

Reviewer 02Rating 4Confidence 3

Strengths

The proposed method is extremely simple and computationally lightweight, as it only requires a single forward pass to get the first token's probabilities and a fast k-NN lookup. It uses the entire log-probability vector with a k-NN classifier rather than a manually selected subset of tokens, and extends this classification from just "Refusal" to also include "Thanks" and "Hello"

Weaknesses

The primary weakness of this paper is its limited novelty and contribution. The core idea that the first token's probabilities can predict the subsequent response, especially for refusals, is not new. The authors themselves cite related work (Arditi et al., 2024) which already derived a "refusal metric" by summing probabilities of "refusal tokens" at the first token position. The reliance on a k-NN classifier is sensitive to the training data. It's unclear how this approach would generalize to

Reviewer 03Rating 0Confidence 5

Strengths

* Addresses a practical problem of reducing inference costs for predictable responses

Weaknesses

* **Fundamentally unsound task definition**: The paper groups refusals, greetings, and acknowledgements together as "boilerplate responses" that can be handled uniformly. This is deeply problematic. Refusals are safety-critical responses that embody the model's alignment training, not "boilerplate waste" to be optimized away. Replacing careful safety mechanisms with a k-NN classifier trained on 3k synthetic examples is inappropriate and potentially dangerous. The proposed solution of routing ref

Code & Models

Datasets

jfrog/boilerplate-detection
dataset· 20 dl
20 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.