Fast and Slow Generating: An Empirical Study on Large and Small Language   Models Collaborative Decoding

Kaiyan Zhang; Jianyu Wang; Ning Ding; Biqing Qi; Ermo Hua; Xingtai Lv,; Bowen Zhou

arXiv:2406.12295·cs.CL·October 24, 2024·2 cites

Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative Decoding

Kaiyan Zhang, Jianyu Wang, Ning Ding, Biqing Qi, Ermo Hua, Xingtai Lv,, Bowen Zhou

PDF

Open Access 1 Repo

TL;DR

This paper introduces FS-GEN, a unified framework inspired by dual-process theory, to analyze collaborative decoding between large and small language models, revealing efficiency and predictability in their interactions.

Contribution

It proposes a unified dual-process framework for understanding large and small language model collaboration, providing insights into their interaction dynamics and efficiency.

Findings

01

Less than 20% of interactions are needed for effective collaboration.

02

Collaboration follows a scaling law related to parameter ratios.

03

Interventions by System 2 are crucial for supporting System 1.

Abstract

Large Language Models (LLMs) exhibit impressive capabilities across various applications but encounter substantial challenges such as high inference latency, considerable training costs, and the generation of hallucinations. Collaborative decoding between large and small language models (SLMs) presents a promising strategy to mitigate these issues through methods including speculative decoding, contrastive decoding, and emulator or proxy fine-tuning. However, the specifics of such collaborations, particularly from a unified perspective, remain largely unexplored. Inspired by dual-process cognitive theory, we propose a unified framework in this paper, termed Fast and Slow Generating (FS-GEN). Within this framework, LLMs (sometimes along with SLMs) are categorized as System 2 (slow and deliberate), while independent SLMs are designated as System 1 (fast and intuitive). We provide a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tsinghuac3i/fs-gen
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Agent Systems and Negotiation · Speech and dialogue systems · Natural Language Processing Techniques