Path-Consistency with Prefix Enhancement for Efficient Inference in LLMs
Jiace Zhu, Yuanzhe Huang, Yingtao Shen, Jie Zhao, An Zou

TL;DR
This paper introduces path-consistency with prefix enhancement, a method that improves large language model inference efficiency by reducing sampling redundancy and errors, achieving up to 40.5% faster inference without sacrificing accuracy.
Contribution
The paper proposes a novel path-consistency approach that dynamically guides LLM generation using prefix confidence, significantly reducing inference time and maintaining accuracy.
Findings
Inference latency improved by up to 40.5%.
Maintains task accuracy across multiple reasoning tasks.
Reduces errors and redundancies from random sampling.
Abstract
To enhance the reasoning capabilities of large language models (LLMs), self-consistency has become a popular approach, combining multiple samplings with majority voting. However, current methods are computationally expensive and time-consuming due to the need for numerous samplings. To address this, this paper introduces path-consistency, which leverages the confidence of earlier-generated answers to identify the most promising prefix and guide the generation of subsequent branches. By dynamically guiding the generation of subsequent branches based on this prefix, path-consistency mitigates both the errors and redundancies from random or less useful sampling in self-consistency. This approach reduces errors and redundancies from random sampling, significantly accelerating inference by minimizing token consumption. Our extensive empirical results demonstrate that path-consistency…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Algorithms and Data Compression · Handwritten Text Recognition Techniques
