The End of Manual Decoding: Towards Truly End-to-End Language Models

Zhichao Wang; Dongyang Ma; Xinting Huang; Deng Cai; Tian Lan; Jiahao Xu; Haitao Mi; Xiaoying Tang; and Yan Wang

arXiv:2510.26697·cs.CL·November 3, 2025

The End of Manual Decoding: Towards Truly End-to-End Language Models

Zhichao Wang, Dongyang Ma, Xinting Huang, Deng Cai, Tian Lan, Jiahao Xu, Haitao Mi, Xiaoying Tang, and Yan Wang

PDF

7 Models 3 Reviews

TL;DR

This paper presents AutoDeco, an architecture enabling truly end-to-end language model decoding by learning to control sampling parameters dynamically, improving performance and interpretability over traditional static decoding methods.

Contribution

AutoDeco introduces a lightweight, token-level control mechanism for decoding that allows models to self-regulate sampling strategies within a single forward pass.

Findings

01

AutoDeco outperforms default decoding strategies across eight benchmarks.

02

The model learns to interpret natural language commands for decoding control.

03

AutoDeco achieves performance comparable to oracle-tuned baselines.

Abstract

The "end-to-end" label for LLMs is a misnomer. In practice, they depend on a non-differentiable decoding process that requires laborious, hand-tuning of hyperparameters like temperature and top-p. This paper introduces AutoDeco, a novel architecture that enables truly "end-to-end" generation by learning to control its own decoding strategy. We augment the standard transformer with lightweight heads that, at each step, dynamically predict context-specific temperature and top-p values alongside the next-token logits. This approach transforms decoding into a parametric, token-level process, allowing the model to self-regulate its sampling strategy within a single forward pass. Through extensive experiments on eight benchmarks, we demonstrate that AutoDeco not only significantly outperforms default decoding strategies but also achieves performance comparable to an oracle-tuned baseline…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 2

Strengths

Novelty and Conceptual Contribution: The paper identifies and addresses a fundamental yet overlooked bottleneck in LLM deployment, the manual, non-differentiable decoding process. AutoDeco reframes decoding as a learnable and parametric component within the model itself, offering a principled step toward fully end-to-end generation.

Weaknesses

1. While the emergent instruction-following behavior is a highlight, the explanation for why this arises is mostly empirical. A deeper analysis (e.g., probing whether linguistic cues correlate with latent space adjustments) would strengthen this claim. 2. Most benchmarks are reasoning or QA-oriented. It would be valuable to test AutoDeco on creative writing, dialogue, or long-form generation, where decoding choices play a larger role. Human evaluation or qualitative examples of improved text q

Reviewer 02Rating 6Confidence 2

Strengths

1. The problem is timely and relevant. Removing manual tuning of decoding hyperparameters can significantly improve the practicality and usability of LLM-based systems. 2. The pseudo-label generation strategy for supervision is clever and helps circumvent the lack of direct ground-truth hyperparameter labels.

Weaknesses

1. A key concern is that the model is trained to further increase the likelihood of the reference text. Since both pre-training and downstream fine-tuning typically already optimize for the likelihood of the ground-truth sequence, this additional adjustment may risk overfitting or reduce robustness in more open-ended generation settings. 2. While dynamic prediction of decoding hyperparameters is appealing, different applications may require different behavior. For example, customer support syst

Reviewer 03Rating 6Confidence 3

Strengths

- The idea is interesting and to my knowledge novel. - The problem space is richer than meets the eye (eg. how do you get training data for supervised training is surprisingly nontrivial). - The results are fairly convincing (Fig2). - They show this doesn't hurt performance (Fig3).

Weaknesses

- In practice, nobody does pure autoregression in real world LLM usage at production-scale. Everyone uses speculative decoding of some sort, and it's not clear to me whether this sampling scheme permits that or breaks it. I would want to see an explanation or formal argument/construction for how speculative decoding would work when the target model has an AutoDeco head to be convinced this would not break speculation. Because if it does, then in practice it will never be used, which would be a m

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.