Efficient LLM Reasoning via Variational Posterior Guidance with Efficiency Awareness

Zizhao Chen; Yuying Li; Siting Lin; Lianxi Wang

arXiv:2605.11019·cs.LG·May 13, 2026

Efficient LLM Reasoning via Variational Posterior Guidance with Efficiency Awareness

Zizhao Chen, Yuying Li, Siting Lin, Lianxi Wang

PDF

TL;DR

This paper introduces VPG-EA, a variational inference framework that guides large language models towards more efficient reasoning by distilling posterior patterns into the prior policy, significantly improving inference efficiency.

Contribution

It formalizes efficient reasoning as a variational inference problem and proposes a novel dual-stream architecture to transfer posterior efficiency patterns to the prior policy.

Findings

01

VPG-EA improves efficiency metric epsilon cubed by 8.73% on 1.5B models.

02

VPG-EA improves efficiency metric epsilon cubed by 12.37% on 7B models.

03

Experiments demonstrate significant efficiency gains over strong baselines.

Abstract

Although large language models rely on chain-of-thought for complex reasoning, the overthinking phenomenon severely degrades inference efficiency. Existing reinforcement learning methods compress reasoning chains by designing elaborate reward functions, which renders high-quality samples extremely sparse in the exploration space and creates a sampling bottleneck for the prior policy. Inspired by cognitive science, we theoretically prove that a posterior distribution guided by reference answers achieves higher expected utility than the prior distribution, thus capable of breaking through the sampling bottleneck of high-quality samples. However, the posterior distribution is unavailable during inference. To this end, we formalize efficient reasoning as a variational inference problem and introduce an efficiency-aware evidence lower bound as the theoretical foundation. Based on this, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.