G-Core: A Simple, Scalable and Balanced RLHF Trainer
Junyu Wu, Weiming Chang, Xiaotao Liu, Guanyou He, Haoqiang Hong, Boqi Liu, Hongtao Tian, Tao Yang, Yunsheng Shi, Feng Lin, Ting Yao

TL;DR
G-Core is a new RLHF training framework that enhances scalability, flexibility, and efficiency for large language models by introducing parallel control and dynamic resource management, demonstrated on real-world applications.
Contribution
G-Core offers a novel parallel controller programming model and adaptive resource placement schema to improve RLHF training scalability and efficiency in complex, dynamic workflows.
Findings
Successfully trained models for WeChat features with large user base
Reduced hardware idle time and improved resource utilization
Enhanced scalability and robustness in real-world RLHF scenarios
Abstract
Reinforcement Learning from Human Feedback (RLHF) has become an increasingly popular paradigm for training large language models (LLMs) and diffusion models. While existing RLHF training systems have enabled significant progress, they often face challenges in scaling to multi-modal and diffusion workflows and adapting to dynamic workloads. In particular, current approaches may encounter limitations in controller scalability, flexible resource placement, and efficient orchestration when handling complex RLHF pipelines, especially in scenarios involving dynamic sampling or generative reward modeling. In this paper, we present \textbf{G-Core}, a simple, scalable, and balanced RLHF training framework designed to address these challenges. G-Core introduces a parallel controller programming model, enabling flexible and efficient orchestration of complex RLHF workflows without the bottlenecks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReal-time simulation and control systems
