SEED: Accelerating Reasoning Tree Construction via Scheduled Speculative Decoding
Zhenglin Wang, Jialong Wu, Yilong Lai, Congzhi Zhang, Deyu Zhou

TL;DR
This paper presents SeeD, an innovative inference framework that accelerates reasoning tree construction in large language models by using scheduled speculative decoding, significantly reducing inference latency and memory usage.
Contribution
SeeD introduces a rounds-scheduled speculative execution strategy to optimize reasoning tree inference in LLMs, enhancing speed and efficiency.
Findings
Achieves significant speedup over baseline methods
Reduces GPU memory consumption during inference
Demonstrates effectiveness across multiple reasoning datasets
Abstract
Large Language Models (LLMs) demonstrate remarkable emergent abilities across various tasks, yet fall short of complex reasoning and planning tasks. The tree-search-based reasoning methods address this by surpassing the capabilities of chain-of-thought prompting, encouraging exploration of intermediate steps. However, such methods introduce significant inference latency due to the systematic exploration and evaluation of multiple thought paths. This paper introduces SeeD, a novel and efficient inference framework to optimize runtime speed and GPU memory management concurrently. By employing a scheduled speculative execution, SeeD efficiently handles multiple iterations for the thought generation and the state evaluation, leveraging a rounds-scheduled strategy to manage draft model dispatching. Extensive experimental evaluations on three reasoning datasets demonstrate superior speedup…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFormal Methods in Verification · Semantic Web and Ontologies · Logic, Reasoning, and Knowledge
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
