Steering LLMs for Formal Theorem Proving
Shashank Kirtania, Arun Iyer

TL;DR
This paper introduces an activation steering method that guides large language models to improve formal theorem proving by manipulating internal activations, enhancing proof generation without additional training.
Contribution
It proposes a novel activation-based intervention for LLMs that improves formal proof synthesis and offers insights into internal reasoning representations.
Findings
Activation steering improves proof accuracy in LLMs.
Method works with multiple decoding strategies.
No additional training required for performance gains.
Abstract
Recent advances in automated theorem proving use Large Language Models (LLMs) to translate informal mathematical statements into formal proofs. However, informal cues are often ambiguous or lack strict logical structure, making it hard for models to interpret them precisely. While existing methods achieve strong performance, little is known about how LLMs internally represent informal cues, or how these influence proof generation. To address this, we explore \textit{activation steering}, an inference-time intervention that identifies linear directions in residual activations associated with informal reasoning traces and adjusts them to improve proof construction without fine-tuning. This mechanism also yields interpretable information about how reasoning is internally encoded in the activation space of LLMs. We test our method for generating formal proofs from already-formalized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSparse Evolutionary Training
