B-PASTE: Beam-Aware Pattern-Guided Speculative Execution for Resource-Constrained LLM Agents

Yanfei Song

arXiv:2604.16469·cs.DC·April 21, 2026

B-PASTE: Beam-Aware Pattern-Guided Speculative Execution for Resource-Constrained LLM Agents

Yanfei Song

PDF

TL;DR

B-PASTE enhances speculative execution in LLM agents by managing future branch hypotheses under resource constraints, reducing latency and improving speedup especially in edge environments.

Contribution

It introduces a beam-aware extension to PASTE that models branch hypotheses and resource constraints, enabling more effective speculative execution.

Findings

01

Up to 1.4X end-to-end speedup in edge environments.

02

Effectiveness of branch-aware speculation under tight resources.

03

Prioritization of serial fast-path execution for early completion.

Abstract

LLM agents execute in an interleaved reasoning-and-action loop, where future tool calls cannot be launched until the current reasoning step completes. This serial dependency inflates end-to-end latency and leaves the model idle while waiting for tool execution. Prior work, Pattern-Aware Speculative Tool Execution (PASTE), mitigates this bottleneck by speculating likely future tool invocations from mined control-flow and data-flow regularities. However, PASTE is tool-centric and speculates only individual invocations rather than bounded future branches. We propose B-PASTE, a beam-aware extension that lifts speculation from single tools to local branch hypotheses under strict resource constraints. B-PASTE maintains a bounded beam of future execution subgraphs, ranks them by expected critical-path reduction rather than raw execution probability, and schedules only high-value branch…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.