ProofSketcher: Hybrid LLM + Lightweight Proof Checker for Reliable Math/Logic Reasoning

Kranthi Kommuru; Kunal Khanvilkar; Gaurav Parekh

arXiv:2604.06401·cs.AI·April 9, 2026

ProofSketcher: Hybrid LLM + Lightweight Proof Checker for Reliable Math/Logic Reasoning

Kranthi Kommuru, Kunal Khanvilkar, Gaurav Parekh

PDF

TL;DR

ProofSketcher introduces a hybrid approach combining large language models and a lightweight proof checker to enhance reliability in mathematical and logical reasoning without full formalization.

Contribution

It presents a novel pipeline where LLMs generate proof sketches in a DSL, which are then expanded into explicit proof obligations by a trusted kernel.

Findings

01

The system can produce reliable proof sketches from LLMs.

02

It reduces the formalization burden compared to full theorem provers.

03

The approach improves trustworthiness in automated reasoning.

Abstract

The large language models (LLMs) might produce a persuasive argument within mathematical and logical fields, although such argument often includes some minor missteps, including the entire omission of side conditions, invalid inference patterns, or appeals to a lemma that cannot be derived logically out of the context being discussed. These omissions are infamously hard to notice solely out of the text, as even the misconstrued construction still may seem mostly accurate. Conversely, interactive theorem provers like Lean and Coq have rigorous reliability by ensuring that syntactic and semantic statements only accept statements that can pass all the syntactic and semantic steps in the program which is a small trusted kernel of the language type-checks with. Despite the fact that this technique provides strong guarantees, it comes at quite a heavy price: the evidence must be completely…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.