SEED: Accelerating Reasoning Tree Construction via Scheduled Speculative   Decoding

Zhenglin Wang; Jialong Wu; Yilong Lai; Congzhi Zhang; Deyu Zhou

arXiv:2406.18200·cs.CL·December 18, 2024

SEED: Accelerating Reasoning Tree Construction via Scheduled Speculative Decoding

Zhenglin Wang, Jialong Wu, Yilong Lai, Congzhi Zhang, Deyu Zhou

PDF

Open Access 1 Repo

TL;DR

This paper presents SeeD, an innovative inference framework that accelerates reasoning tree construction in large language models by using scheduled speculative decoding, significantly reducing inference latency and memory usage.

Contribution

SeeD introduces a rounds-scheduled speculative execution strategy to optimize reasoning tree inference in LLMs, enhancing speed and efficiency.

Findings

01

Achieves significant speedup over baseline methods

02

Reduces GPU memory consumption during inference

03

Demonstrates effectiveness across multiple reasoning datasets

Abstract

Large Language Models (LLMs) demonstrate remarkable emergent abilities across various tasks, yet fall short of complex reasoning and planning tasks. The tree-search-based reasoning methods address this by surpassing the capabilities of chain-of-thought prompting, encouraging exploration of intermediate steps. However, such methods introduce significant inference latency due to the systematic exploration and evaluation of multiple thought paths. This paper introduces SeeD, a novel and efficient inference framework to optimize runtime speed and GPU memory management concurrently. By employing a scheduled speculative execution, SeeD efficiently handles multiple iterations for the thought generation and the state evaluation, leveraging a rounds-scheduled strategy to manage draft model dispatching. Extensive experimental evaluations on three reasoning datasets demonstrate superior speedup…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Linking-ai/SEED
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFormal Methods in Verification · Semantic Web and Ontologies · Logic, Reasoning, and Knowledge

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings