Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks

Debargha Ganguly; Vikash Singh; Sreehari Sankar; Biyao Zhang; Xuecen Zhang; Srinivasan Iyengar; Xiaotian Han; Amit Sharma; Shivkumar Kalyanaraman; Vipin Chaudhary

arXiv:2505.20047·cs.CL·May 27, 2025

Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks

Debargha Ganguly, Vikash Singh, Sreehari Sankar, Biyao Zhang, Xuecen Zhang, Srinivasan Iyengar, Xiaotian Han, Amit Sharma, Shivkumar Kalyanaraman, Vipin Chaudhary

PDF

Open Access

TL;DR

This paper investigates the limitations of LLMs in formal reasoning, proposing a probabilistic grammar framework to better quantify uncertainty and improve reliability in automated formalization tasks.

Contribution

It introduces a PCFG-based uncertainty modeling approach and a fusion method for selective verification, enhancing trustworthiness of LLM-generated formal artifacts.

Findings

01

Uncertainty signals are task-dependent, with high AUROC scores.

02

SMT-based autoformalization improves logical task accuracy by up to 34.8%.

03

Error reduction of 14-100% with minimal abstention.

Abstract

Large language models (LLMs) show remarkable promise for democratizing automated reasoning by generating formal specifications. However, a fundamental tension exists: LLMs are probabilistic, while formal verification demands deterministic guarantees. This paper addresses this epistemological gap by comprehensively investigating failure modes and uncertainty quantification (UQ) in LLM-generated formal artifacts. Our systematic evaluation of five frontier LLMs reveals Satisfiability Modulo Theories (SMT) based autoformalization's domain-specific impact on accuracy (from +34.8% on logical tasks to -44.5% on factual ones), with known UQ techniques like the entropy of token probabilities failing to identify these errors. We introduce a probabilistic context-free grammar (PCFG) framework to model LLM outputs, yielding a refined uncertainty taxonomy. We find uncertainty signals are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Agent Systems and Negotiation