Speculate, then Collaborate: Fusing Knowledge of Language Models during   Decoding

Ziyao Wang; Muneeza Azmat; Ang Li; Raya Horesh; Mikhail Yurochkin

arXiv:2502.08020·cs.CL·March 20, 2025

Speculate, then Collaborate: Fusing Knowledge of Language Models during Decoding

Ziyao Wang, Muneeza Azmat, Ang Li, Raya Horesh, Mikhail Yurochkin

PDF

Open Access

TL;DR

This paper introduces a novel collaborative decoding algorithm that fuses knowledge from multiple language models during inference, enhancing accuracy, efficiency, and explainability without additional training.

Contribution

The paper presents CoSD, a new decoding method enabling test-time knowledge fusion of LLMs using a simple rule-based system, without requiring retraining.

Findings

01

Improves accuracy by up to 10% across benchmarks.

02

Enhances inference efficiency and explainability.

03

Transferable across different models and domains.

Abstract

Large Language Models (LLMs) often excel in specific domains but fall short in others due to the limitations of their training. Thus, enabling LLMs to solve problems collaboratively by integrating their complementary knowledge promises to improve their performance across domains. To realize this potential, we introduce a novel Collaborative Speculative Decoding (CoSD) algorithm that enables efficient LLM knowledge fusion at test time without requiring additional model training. CoSD employs a draft model to generate initial sequences and an easy-to-learn rule or decision tree to decide when to invoke an assistant model to improve these drafts. CoSD not only enhances knowledge fusion but also improves inference efficiency, is transferable across domains and models, and offers greater explainability. Experimental results demonstrate that CoSD improves accuracy by up to 10\% across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling