RLSF: Fine-tuning LLMs via Symbolic Feedback

Piyush Jha; Prithwish Jana; Pranavkrishna Suresh; Arnav Arora; Vijay Ganesh

arXiv:2405.16661·cs.CL·February 27, 2026

RLSF: Fine-tuning LLMs via Symbolic Feedback

Piyush Jha, Prithwish Jana, Pranavkrishna Suresh, Arnav Arora, Vijay Ganesh

PDF

Open Access

TL;DR

This paper introduces RLSF, a novel fine-tuning method for LLMs that uses symbolic reasoning tools to provide detailed, error-correcting feedback, improving performance on domain-specific tasks without relying on differentiable reasoning.

Contribution

The paper presents RLSF, a new fine-tuning paradigm that leverages symbolic reasoning tools for precise, token-level feedback, bridging symbolic reasoning and LLM training.

Findings

01

RLSF outperforms traditional fine-tuning on five tasks.

02

Smaller LLMs fine-tuned with RLSF surpass larger models.

03

RLSF effectively incorporates domain constraints into LLMs.

Abstract

Large Language Models (LLMs) have transformed AI but often struggle with tasks that require domain-specific reasoning and logical alignment. Traditional fine-tuning methods do not leverage the vast amount of symbolic domain-knowledge available to us via symbolic reasoning tools (e.g., provers), and are further limited by sparse rewards and unreliable reward models. We introduce Reinforcement Learning via Symbolic Feedback (RLSF), a novel fine-tuning paradigm where symbolic reasoning tools (e.g., solvers, provers, and algebra systems) provide fine-grained feedback to LLMs. RLSF uses poly-sized certificates (e.g., proofs) generated by symbolic tools to identify and correct errors in model outputs, offering token-level guidance without requiring differentiable reasoning systems. This paradigm bridges the gap between symbolic reasoning and LLM fine-tuning, enabling precise alignment with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications