Parent-Guided Semantic Reward Model (PGSRM): Embedding-Based Reward Functions for Reinforcement Learning of Transformer Language Models
Alexandr Plashchinsky

TL;DR
This paper presents PGSRM, a simple and effective embedding-based reward framework for reinforcement learning in language models, replacing complex reward signals with cosine similarity for improved stability and alignment.
Contribution
PGSRM introduces a lightweight, embedding-based reward method that eliminates the need for human annotations and additional training, enhancing RL stability and efficiency in language model alignment.
Findings
PGSRM yields smoother reward improvements.
It results in more stable PPO training dynamics.
It effectively aligns smaller transformer models.
Abstract
We introduce the Parent-Guided Semantic Reward Model (PGSRM), a lightweight reward framework for reinforcement learning (RL) of transformer language models. PGSRM replaces binary correctness signals, human preference data, and trained reward models with a simple signal: cosine similarity between a parent model's reference output embedding and a child model's generated output for the same input. This yields a dense, semantically meaningful reward with no human annotation or additional model training. We apply PGSRM on five language tasks and find that it produces smoother reward improvement and more stable PPO dynamics than a binary reward baseline, suggesting that embedding-based semantic rewards are a practical alternative to RLHF-style reward modeling for parent-guided alignment in smaller transformer models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Digital Mental Health Interventions
