G-Core: A Simple, Scalable and Balanced RLHF Trainer

Junyu Wu; Weiming Chang; Xiaotao Liu; Guanyou He; Haoqiang Hong; Boqi Liu; Hongtao Tian; Tao Yang; Yunsheng Shi; Feng Lin; Ting Yao

arXiv:2507.22789·cs.LG·August 1, 2025

G-Core: A Simple, Scalable and Balanced RLHF Trainer

Junyu Wu, Weiming Chang, Xiaotao Liu, Guanyou He, Haoqiang Hong, Boqi Liu, Hongtao Tian, Tao Yang, Yunsheng Shi, Feng Lin, Ting Yao

PDF

Open Access

TL;DR

G-Core is a new RLHF training framework that enhances scalability, flexibility, and efficiency for large language models by introducing parallel control and dynamic resource management, demonstrated on real-world applications.

Contribution

G-Core offers a novel parallel controller programming model and adaptive resource placement schema to improve RLHF training scalability and efficiency in complex, dynamic workflows.

Findings

01

Successfully trained models for WeChat features with large user base

02

Reduced hardware idle time and improved resource utilization

03

Enhanced scalability and robustness in real-world RLHF scenarios

Abstract

Reinforcement Learning from Human Feedback (RLHF) has become an increasingly popular paradigm for training large language models (LLMs) and diffusion models. While existing RLHF training systems have enabled significant progress, they often face challenges in scaling to multi-modal and diffusion workflows and adapting to dynamic workloads. In particular, current approaches may encounter limitations in controller scalability, flexible resource placement, and efficient orchestration when handling complex RLHF pipelines, especially in scenarios involving dynamic sampling or generative reward modeling. In this paper, we present \textbf{G-Core}, a simple, scalable, and balanced RLHF training framework designed to address these challenges. G-Core introduces a parallel controller programming model, enabling flexible and efficient orchestration of complex RLHF workflows without the bottlenecks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReal-time simulation and control systems