Balancing Enhancement, Harmlessness, and General Capabilities: Enhancing   Conversational LLMs with Direct RLHF

Chen Zheng; Ke Sun; Hang Wu; Chenguang Xi; Xun Zhou

arXiv:2403.02513·cs.CL·March 6, 2024·2 cites

Balancing Enhancement, Harmlessness, and General Capabilities: Enhancing Conversational LLMs with Direct RLHF

Chen Zheng, Ke Sun, Hang Wu, Chenguang Xi, Xun Zhou

PDF

Open Access 1 Models

TL;DR

This paper introduces a novel approach to improve conversational LLMs by directly applying Harmless RLHF, bypassing traditional supervised fine-tuning, which preserves capabilities and reduces toxicity, demonstrated on the Mistral model.

Contribution

The paper proposes a direct RLHF method that enhances conversational abilities and safety of LLMs without the drawbacks of supervised fine-tuning, applied to the Mistral model.

Findings

01

Mistral-Plus outperforms similar-sized open-source models.

02

Significant improvement in conversational abilities.

03

Reduced toxic output generation.

Abstract

In recent advancements in Conversational Large Language Models (LLMs), a concerning trend has emerged, showing that many new base LLMs experience a knowledge reduction in their foundational capabilities following Supervised Fine-Tuning (SFT). This process often leads to issues such as forgetting or a decrease in the base model's abilities. Moreover, fine-tuned models struggle to align with user preferences, inadvertently increasing the generation of toxic outputs when specifically prompted. To overcome these challenges, we adopted an innovative approach by completely bypassing SFT and directly implementing Harmless Reinforcement Learning from Human Feedback (RLHF). Our method not only preserves the base model's general capabilities but also significantly enhances its conversational abilities, while notably reducing the generation of toxic outputs. Our approach holds significant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
zhengchenphd/Mistral-Plus-7B
model· 272 dl· ♡ 4
272 dl♡ 4

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Topic Modeling

Methodstravel james · Shrink and Fine-Tune · ALIGN · Balanced Selection