OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

Jian Hu; Xibin Wu; Wei Shen; Jason Klein Liu; Zilin Zhu; Weixun Wang; Songlin Jiang; Haoran Wang; Hao Chen; Bin Chen; Weikai Fang; Xianyu; Yu Cao; Haotian Xu; Yiming Liu

arXiv:2405.11143·cs.AI·October 10, 2025·1 cites

OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

Jian Hu, Xibin Wu, Wei Shen, Jason Klein Liu, Zilin Zhu, Weixun Wang, Songlin Jiang, Haoran Wang, Hao Chen, Bin Chen, Weikai Fang, Xianyu, Yu Cao, Haotian Xu, Yiming Liu

PDF

Open Access 4 Repos 1 Models

TL;DR

OpenRLHF is a user-friendly, scalable, and high-performance open-source framework for Reinforcement Learning from Human Feedback, designed to improve accessibility and efficiency in training large language models.

Contribution

It introduces a simplified, well-structured RLHF framework built on popular tools, achieving faster training speeds and easier implementation compared to existing solutions.

Findings

01

Achieves 1.22x to 1.68x speedup over state-of-the-art frameworks

02

Requires fewer lines of code for implementation

03

Facilitates entry for researchers and practitioners

Abstract

Large Language Models (LLMs) fine-tuned via Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning with Verifiable Rewards (RLVR) significantly improve the alignment of human-AI values, further raising the upper bound of AI capabilities, particularly in reasoning-intensive, long-context Chain-of-Thought (CoT) tasks. However, existing frameworks commonly face challenges such as inference bottlenecks and complexity barriers, which restrict their accessibility to newcomers. To bridge this gap, we introduce \textbf{OpenRLHF}, a user-friendly, scalable, and easy-to-learn open-source RLHF framework built upon Ray, vLLM, DeepSpeed, and HuggingFace Transformers, featuring a simplified design, clear code structure, and comprehensive documentation to facilitate entry for researchers and practitioners. Experimental results show that OpenRLHF achieves superior training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
Jennny/llama3-1-8b-bb-rm
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed and Parallel Computing Systems · Embedded Systems Design Techniques · Parallel Computing and Optimization Techniques

MethodsDirect Preference Optimization