Mix Data or Merge Models? Balancing the Helpfulness, Honesty, and Harmlessness of Large Language Model via Model Merging
Jinluan Yang, Dingnan Jin, Anke Tang, Li Shen, Didi Zhu, Zhengyu Chen, Ziyu Zhao, Daixin Wang, Qing Cui, Zhiqiang Zhang, Jun Zhou, Fei Wu, Kun Kuang

TL;DR
This paper compares data mixture and model merging strategies for aligning large language models with Helpfulness, Honesty, and Harmlessness, introducing RESM, a novel merging method that improves balance and robustness.
Contribution
It systematically evaluates merging and data mixture approaches for 3H alignment and proposes RESM, a new weighted model merging technique enhancing balance and robustness.
Findings
RESM outperforms previous methods with 2-5% gains.
Model merging and data mixture have distinct advantages and limitations.
Extensive evaluations confirm RESM's effectiveness and robustness.
Abstract
Achieving balanced alignment of large language models (LLMs) in terms of Helpfulness, Honesty, and Harmlessness (3H optimization) constitutes a cornerstone of responsible AI. Existing methods like data mixture strategies face limitations, including heavy reliance on expert knowledge and conflicting optimization signals. While model merging offers parameter-level conflict-resolution strategies through integrating specialized models' parameters, its potential for 3H optimization remains underexplored. This paper systematically compares the effectiveness of model merging and data mixture methods in constructing 3H-aligned LLMs for the first time, revealing previously overlooked collaborative and conflict relationships among the 3H dimensions and discussing the advantages and drawbacks of data mixture (\textit{data-level}) and model merging (\textit{parameter-level}) methods in mitigating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗yangjinluan/3H_Merging_Mistral_Honestymodel· 4 dl4 dl
- 🤗yangjinluan/3H_Merging_Mistral_Harmlessnessmodel
- 🤗yangjinluan/3H_Merging_Mistral_Helpfulnessmodel· 1 dl1 dl
- 🤗yangjinluan/3H_Merging_Mistral_Helpfulness_Honestymodel
- 🤗yangjinluan/3H_Merging_Mistral_Helpfulness_Harmlessnessmodel· 1 dl1 dl
- 🤗yangjinluan/3H_Merging_Llama3_Honestymodel· 1 dl1 dl
- 🤗yangjinluan/3H_Merging_Llama3_Helpfulnessmodel· 4 dl4 dl
- 🤗yangjinluan/3H_Merging_Llama3_Harmlessnessmodel· 2 dl2 dl
- 🤗yangjinluan/3H_Merging_Llama3_Helpfulness_Honestymodel· 3 dl3 dl
- 🤗yangjinluan/3H_Merging_Llama3_Helpfulness_Harmlessnessmodel· 1 dl1 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
MethodsPruning
