ChatGLM-RLHF: Practices of Aligning Large Language Models with Human   Feedback

Zhenyu Hou; Yilin Niu; Zhengxiao Du; Xiaohan Zhang; Xiao Liu; Aohan; Zeng; Qinkai Zheng; Minlie Huang; Hongning Wang; Jie Tang; Yuxiao Dong

arXiv:2404.00934·cs.CL·April 4, 2024·3 cites

ChatGLM-RLHF: Practices of Aligning Large Language Models with Human Feedback

Zhenyu Hou, Yilin Niu, Zhengxiao Du, Xiaohan Zhang, Xiao Liu, Aohan, Zeng, Qinkai Zheng, Minlie Huang, Hongning Wang, Jie Tang, Yuxiao Dong

PDF

Open Access

TL;DR

This paper details the development and implementation of ChatGLM-RLHF, a reinforcement learning from human feedback system designed to improve the alignment of ChatGLM large language models with human preferences, addressing unique challenges in large-scale training.

Contribution

It introduces novel strategies for stabilizing RLHF training, including reward variance mitigation, model parallelism, and regularization to prevent catastrophic forgetting in LLMs.

Findings

01

ChatGLM-RLHF improves alignment performance by 15% over ChatGLM-SFT.

02

The system effectively addresses challenges in large-scale RLHF training.

03

Experimental results demonstrate enhanced alignment in Chinese tasks.

Abstract

ChatGLM is a free-to-use AI service powered by the ChatGLM family of large language models (LLMs). In this paper, we present the ChatGLM-RLHF pipeline -- a reinforcement learning from human feedback (RLHF) system -- designed to enhance ChatGLM's alignment with human preferences. ChatGLM-RLHF encompasses three major components: the collection of human preference data, the training of the reward model, and the optimization of policies. Throughout the process of integrating ChatGLM-RLHF into production, we encountered and addressed several unprecedented challenges. We introduce the strategies to mitigate reward variance for stabilized large-scale training, implement model parallelism with fused gradient-descent, and design regularization constraints to avoid catastrophic forgetting in LLMs. Experiments show that ChatGLM-RLHF brings significant improvements in alignment tasks compared to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

Methodstravel james