AdaFuse: Adaptive Ensemble Decoding with Test-Time Scaling for LLMs

Chengming Cui; Tianxin Wei; Ziyi Chen; Ruizhong Qiu; Zhichen Zeng; Zhining Liu; Xuying Ning; Duo Zhou; Jingrui He

arXiv:2601.06022·cs.CL·January 12, 2026

AdaFuse: Adaptive Ensemble Decoding with Test-Time Scaling for LLMs

Chengming Cui, Tianxin Wei, Ziyi Chen, Ruizhong Qiu, Zhichen Zeng, Zhining Liu, Xuying Ning, Duo Zhou, Jingrui He

PDF

Open Access

TL;DR

AdaFuse introduces an adaptive ensemble decoding method for large language models that dynamically adjusts fusion granularity during generation, improving performance across various tasks by leveraging test-time scaling and uncertainty measures.

Contribution

It proposes a novel adaptive ensemble framework that dynamically selects fusion units during decoding, enhancing flexibility and performance without retraining.

Findings

01

Achieves an average 6.88% relative improvement over strong ensemble baselines.

02

Effectively adapts fusion granularity based on decoding context and uncertainty.

03

Demonstrates superior performance in question answering, reasoning, and translation tasks.

Abstract

Large language models (LLMs) exhibit complementary strengths arising from differences in pretraining data, model architectures, and decoding behaviors. Inference-time ensembling provides a practical way to combine these capabilities without retraining. However, existing ensemble approaches suffer from fundamental limitations. Most rely on fixed fusion granularity, which lacks the flexibility required for mid-generation adaptation and fails to adapt to different generation characteristics across tasks. To address these challenges, we propose AdaFuse, an adaptive ensemble decoding framework that dynamically selects semantically appropriate fusion units during generation. Rather than committing to a fixed granularity, AdaFuse adjusts fusion behavior on the fly based on the decoding context, with words serving as basic building blocks for alignment. To be specific, we introduce an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Graph Neural Networks · Natural Language Processing Techniques