SpecFuse: Ensembling Large Language Models via Next-Segment Prediction

Bo Lv; Nayu Liu; Chen Tang; Xin Liu; Yue Yu; Ping Luo

arXiv:2412.07380·cs.CL·March 9, 2026

SpecFuse: Ensembling Large Language Models via Next-Segment Prediction

Bo Lv, Nayu Liu, Chen Tang, Xin Liu, Yue Yu, Ping Luo

PDF

Open Access

TL;DR

SpecFuse introduces SpecEM, a dynamic, training-free ensemble framework for large language models that improves performance by segment-level collaboration and real-time model weighting based on task-specific performance.

Contribution

The paper presents SpecEM, a novel plug-and-play ensemble method that enables real-time, segment-level collaboration and adaptive weighting of LLMs without additional training.

Findings

01

Consistent performance improvements over state-of-the-art ensemble methods.

02

Effective dynamic weighting based on model performance during verification.

03

Applicable across multiple LLM sizes and diverse benchmark datasets.

Abstract

Ensembles of generative large language models (LLMs) are a promising way to compensate for individual model limitations, integrating the strengths of different LLMs. Existing LLM ensemble methods, however, face limitations such as first-token delay and challenges in long-range semantic collaboration between models, Moreover, they typically assume equal voting weights for all models during ensemble, ignoring task-specific performance differences among models. In this work, we propose SpecEM, a training-free, plug-and-play LLM ensemble framework that dynamically adjusts each model's model contribution in real time based on task performance. Inspired by speculative decoding, SpecEM iteratively performs drafting and verification, allowing models to collaborate semantically at the segment level for integrated output. Furthermore, we introduce an online feedback mechanism with multiplicative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsBalanced Selection