Latent Self-Consistency for Reliable Majority-Set Selection in Short- and Long-Answer Reasoning
Jungsuk Oh, Jay-Yoon Lee

TL;DR
The paper introduces Latent Self-Consistency (LSC), a novel method that improves the reliability of answer selection in large language models across short- and long-form questions by using learnable token embeddings with minimal computational overhead.
Contribution
LSC is a new, lightweight approach that enhances consistency-based answer selection in LLMs without altering model architecture or significantly increasing inference time.
Findings
LSC outperforms existing methods like SC, USC, and WUCS on multiple benchmarks.
LSC maintains low calibration error across answer formats.
LSC adds less than 1% runtime overhead during inference.
Abstract
Probabilistic decoding in Large Language Models (LLMs) often yields inconsistent outputs, particularly on complex or long-form questions. Self-Consistency (SC) mitigates this for short-form QA by majority voting over exact strings, whereas Universal Self-Consistency (USC) and Weighted Unigram Consistency Score (WUCS) extend to long-form responses but lose accuracy on short-form benchmarks. We introduce \textbf{Latent Self-Consistency (LSC)}, which selects the most semantically consistent response using learnable token embeddings. LSC's lightweight forward processing of summary tokens only introduces negligible runtime overhead (at most ) on top of standard decoding of the base LLM, and requires no changes to the model architecture. Across 6 short-form and 5 long-form reasoning benchmarks (e.g., MATH, MMLU, TruthfulQA), LSC surpasses SC, USC, and WUCS on both short-form and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
