Distributional Energy-Based Models for Uncertainty-Aware Structured LLM Reasoning
Shireen Kudukkil Manchingal, Abhey Kalia, Fernanda Gon\c{c}alves, Shebin Rawther

TL;DR
This paper introduces a distributional energy-based verification method for structured LLM outputs, improving accuracy and constraint adherence across multiple benchmarks by combining learned quality scoring with analytical constraints.
Contribution
It presents a novel decomposed energy function with a heterogeneous ensemble verifier that enhances structured reasoning verification and outperforms large open-generation models.
Findings
Outperforms single-shot Qwen-72B on all benchmarks
Reduces constraint violations by 53% on TravelPlanner
Achieves 93.9% accuracy on GSM8K without prior math training
Abstract
When Large Language Models produce structured outputs such as travel plans, code solutions, or multi-step proofs, individual reasoning steps may appear correct while the output as a whole violates budgets, fails test cases, or contradicts earlier deductions. We propose a decomposed energy function that combines a learned quality scorer with deterministic analytical constraint penalties for verifying structured LLM outputs. The quality scorer is a heterogeneous ensemble of low-rank adapters on a single frozen encoder (3% trainable parameters); the ensemble mean ranks candidates while the standard deviation quantifies epistemic uncertainty, driving a two-pass inference loop that triggers targeted regeneration or abstention. Across five benchmarks (GSM8K, MuSR, TravelPlanner, TACO, Knights & Knaves), our 149M-parameter verifier orchestrating a pool of 7-26B open generators outperforms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
