Hierarchical Verification of Speculative Beams for Accelerating LLM Inference

Jaydip Sen; Harshitha Puvvala; Subhasis Dasgupta

arXiv:2508.03726·cs.CL·August 7, 2025

Hierarchical Verification of Speculative Beams for Accelerating LLM Inference

Jaydip Sen, Harshitha Puvvala, Subhasis Dasgupta

PDF

TL;DR

This paper introduces the Hierarchical Verification Tree (HVT), a novel framework that accelerates large language model inference by prioritizing high-likelihood drafts and enabling early pruning, reducing computational costs without retraining.

Contribution

The paper presents a new hierarchical verification approach for speculative beam decoding that improves efficiency and correctness without modifying model architecture.

Findings

01

HVT reduces inference time significantly across multiple datasets.

02

Energy consumption decreases with minimal impact on output quality.

03

HVT outperforms existing speculative decoding methods in speed and efficiency.

Abstract

Large language models (LLMs) have achieved remarkable success across diverse natural language processing tasks but face persistent challenges in inference efficiency due to their autoregressive nature. While speculative decoding and beam sampling offer notable improvements, traditional methods verify draft sequences sequentially without prioritization, leading to unnecessary computational overhead. This work proposes the Hierarchical Verification Tree (HVT), a novel framework that restructures speculative beam decoding by prioritizing high-likelihood drafts and enabling early pruning of suboptimal candidates. Theoretical foundations and a formal verification-pruning algorithm are developed to ensure correctness and efficiency. Integration with standard LLM inference pipelines is achieved without requiring retraining or architecture modification. Experimental evaluations across multiple…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.