Symmetric Reinforcement Learning Loss for Robust Learning on Diverse Tasks and Model Scales

Ju-Seung Byun; Andrew Perrault

arXiv:2405.17618·cs.LG·June 24, 2025

Symmetric Reinforcement Learning Loss for Robust Learning on Diverse Tasks and Model Scales

Ju-Seung Byun, Andrew Perrault

PDF

Open Access 1 Repo

TL;DR

This paper introduces a symmetric reinforcement learning loss inspired by supervised learning techniques, significantly improving training stability and performance across diverse tasks, model scales, and feedback scenarios.

Contribution

It adapts the reverse cross entropy loss to reinforcement learning, enhancing robustness and stability, especially in noisy data and large language model fine-tuning.

Findings

01

Improved performance in Atari, MuJoCo, and Box2D tasks.

02

Enhanced RLHF results in language models for sentiment and summarization.

03

Notable stability gains with symmetric loss across hyperparameters.

Abstract

Reinforcement learning (RL) training is inherently unstable due to factors such as moving targets and high gradient variance. Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning from AI Feedback (RLAIF) can introduce additional difficulty. Differing preferences can complicate the alignment process, and prediction errors in a trained reward model can become more severe as the LLM generates unseen outputs. To enhance training robustness, RL has adopted techniques from supervised learning, such as ensembles and layer normalization. In this work, we improve the stability of RL training by adapting the reverse cross entropy (RCE) from supervised learning for noisy data to define a symmetric RL loss. We demonstrate performance improvements across various tasks and scales. We conduct experiments in discrete action tasks (Atari games) and continuous action space tasks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shashacks/symmetric_rl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsA2C · Entropy Regularization · Proximal Policy Optimization