A First-Order Logic-Based Alternative to Reward Models in RLHF

Chunjin Jian; Xinhua Zhu

arXiv:2512.14100·cs.LG·December 17, 2025

A First-Order Logic-Based Alternative to Reward Models in RLHF

Chunjin Jian, Xinhua Zhu

PDF

Open Access

TL;DR

This paper introduces a logic-similarity-based reward mechanism for RLHF that replaces traditional reward models, using formal logical consistency to improve alignment of language models with human preferences.

Contribution

It proposes S-GRPO, a supervised variant of GRPO, integrating logical consistency and joint optimization to enhance model alignment and robustness.

Findings

01

S-GRPO outperforms standard supervised fine-tuning in performance and robustness.

02

The method extends preference-learning frameworks like GRPO and DPO.

03

The approach offers a flexible, task-adaptive alignment training method.

Abstract

Reinforcement Learning from Human Feedback (RLHF) plays a crucial role in aligning large language models (LLMs) with human values and preferences. However, the quality and stability of the trained reward model largely determine the final alignment performance. Existing approaches such as Proximal Policy Optimization (PPO) rely heavily on reward models to guide LLMs toward human-aligned behaviors. In this work, we propose a logic-similarity-based reward mechanism as an alternative to conventional reward modeling. Instead of relying on heuristic reward estimation, our method leverages formal logical consistency to steer model alignment with human preferences. Since real-world questions can be interpreted from multiple perspectives, to ensure that logic-based reinforcement learning does not cause model collapse, we introduce S-GRPO, a supervised variant of the GRPO framework. S-GRPO…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Recommender Systems and Techniques