C2T: A Classifier-Based Tree Construction Method in Speculative Decoding

Feiye Huo; Jianchao Tan; Kefeng Zhang; Xunliang Cai; Shengli Sun

arXiv:2502.13652·cs.CL·February 20, 2025

C2T: A Classifier-Based Tree Construction Method in Speculative Decoding

Feiye Huo, Jianchao Tan, Kefeng Zhang, Xunliang Cai, Shengli Sun

PDF

Open Access

TL;DR

C2T introduces a classifier-based approach for dynamic token tree construction in speculative decoding, significantly reducing candidate tokens and improving efficiency in large language model inference.

Contribution

The paper presents a novel classifier-driven method for dynamic token tree generation that outperforms existing strategies in efficiency and accuracy.

Findings

01

Reduces candidate tokens by 25% compared to SOTA methods.

02

Maintains or improves acceptance length in decoding.

03

Outperforms EAGLE-2 on multiple benchmarks.

Abstract

The growing scale of Large Language Models (LLMs) has exacerbated inference latency and computational costs. Speculative decoding methods, which aim to mitigate these issues, often face inefficiencies in the construction of token trees and the verification of candidate tokens. Existing strategies, including chain mode, static tree, and dynamic tree approaches, have limitations in accurately preparing candidate token trees for verification. We propose a novel method named C2T that adopts a lightweight classifier to generate and prune token trees dynamically. Our classifier considers additional feature variables beyond the commonly used joint probability to predict the confidence score for each draft token to determine whether it is the candidate token for verification. This method outperforms state-of-the-art (SOTA) methods such as EAGLE-2 on multiple benchmarks, by reducing the total…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications