The Truncation Blind Spot: How Decoding Strategies Systematically Exclude Human-Like Token Choices

Esteban Garces Arias; Nurzhan Sapargali; Christian Heumann; Matthias A{\ss}enmacher

arXiv:2603.18482·cs.CL·March 20, 2026

The Truncation Blind Spot: How Decoding Strategies Systematically Exclude Human-Like Token Choices

Esteban Garces Arias, Nurzhan Sapargali, Christian Heumann, Matthias A{\ss}enmacher

PDF

Open Access

TL;DR

This paper reveals that current likelihood-based decoding strategies in text generation exclude human-like tokens, making machine-generated text more detectable, and shows that adjusting decoding parameters affects detectability more than model size.

Contribution

It identifies the truncation blind spot in decoding strategies and demonstrates its impact on text detectability, highlighting the importance of decoding choices over model scale.

Findings

01

8-18% of human tokens are outside typical truncation boundaries.

02

Simple classifiers can detect machine-generated text with high accuracy.

03

Decoding parameters influence detectability more than model size or architecture.

Abstract

Standard decoding strategies for text generation, including top-k, nucleus sampling, and contrastive search, select tokens based on likelihood, restricting selection to high-probability regions. Human language production operates differently: tokens are chosen for communicative appropriateness rather than statistical frequency. This mismatch creates a truncation blind spot: contextually appropriate but statistically rare tokens remain accessible to humans yet unreachable by likelihood-based decoding. We hypothesize this contributes to the detectability of machine-generated text. Analyzing over 1.8 million texts across eight language models, five decoding strategies, and 53 hyperparameter configurations, we find that 8-18% of human-selected tokens fall outside typical truncation boundaries. Simple classifiers trained on predictability and lexical diversity achieve remarkable detection…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling · Language and cultural evolution · Natural Language Processing Techniques