RLSF: Fine-tuning LLMs via Symbolic Feedback
Piyush Jha, Prithwish Jana, Pranavkrishna Suresh, Arnav Arora, Vijay Ganesh

TL;DR
This paper introduces RLSF, a novel fine-tuning method for LLMs that uses symbolic reasoning tools to provide detailed, error-correcting feedback, improving performance on domain-specific tasks without relying on differentiable reasoning.
Contribution
The paper presents RLSF, a new fine-tuning paradigm that leverages symbolic reasoning tools for precise, token-level feedback, bridging symbolic reasoning and LLM training.
Findings
RLSF outperforms traditional fine-tuning on five tasks.
Smaller LLMs fine-tuned with RLSF surpass larger models.
RLSF effectively incorporates domain constraints into LLMs.
Abstract
Large Language Models (LLMs) have transformed AI but often struggle with tasks that require domain-specific reasoning and logical alignment. Traditional fine-tuning methods do not leverage the vast amount of symbolic domain-knowledge available to us via symbolic reasoning tools (e.g., provers), and are further limited by sparse rewards and unreliable reward models. We introduce Reinforcement Learning via Symbolic Feedback (RLSF), a novel fine-tuning paradigm where symbolic reasoning tools (e.g., solvers, provers, and algebra systems) provide fine-grained feedback to LLMs. RLSF uses poly-sized certificates (e.g., proofs) generated by symbolic tools to identify and correct errors in model outputs, offering token-level guidance without requiring differentiable reasoning systems. This paradigm bridges the gap between symbolic reasoning and LLM fine-tuning, enabling precise alignment with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications
