Steering LLMs for Formal Theorem Proving

Shashank Kirtania; Arun Iyer

arXiv:2502.15507·cs.LG·October 14, 2025

Steering LLMs for Formal Theorem Proving

Shashank Kirtania, Arun Iyer

PDF

TL;DR

This paper introduces an activation steering method that guides large language models to improve formal theorem proving by manipulating internal activations, enhancing proof generation without additional training.

Contribution

It proposes a novel activation-based intervention for LLMs that improves formal proof synthesis and offers insights into internal reasoning representations.

Findings

01

Activation steering improves proof accuracy in LLMs.

02

Method works with multiple decoding strategies.

03

No additional training required for performance gains.

Abstract

Recent advances in automated theorem proving use Large Language Models (LLMs) to translate informal mathematical statements into formal proofs. However, informal cues are often ambiguous or lack strict logical structure, making it hard for models to interpret them precisely. While existing methods achieve strong performance, little is known about how LLMs internally represent informal cues, or how these influence proof generation. To address this, we explore \textit{activation steering}, an inference-time intervention that identifies linear directions in residual activations associated with informal reasoning traces and adjusts them to improve proof construction without fine-tuning. This mechanism also yields interpretable information about how reasoning is internally encoded in the activation space of LLMs. We test our method for generating formal proofs from already-formalized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSparse Evolutionary Training