Mix Data or Merge Models? Balancing the Helpfulness, Honesty, and Harmlessness of Large Language Model via Model Merging

Jinluan Yang; Dingnan Jin; Anke Tang; Li Shen; Didi Zhu; Zhengyu Chen; Ziyu Zhao; Daixin Wang; Qing Cui; Zhiqiang Zhang; Jun Zhou; Fei Wu; Kun Kuang

arXiv:2502.06876·cs.CL·February 3, 2026

Mix Data or Merge Models? Balancing the Helpfulness, Honesty, and Harmlessness of Large Language Model via Model Merging

Jinluan Yang, Dingnan Jin, Anke Tang, Li Shen, Didi Zhu, Zhengyu Chen, Ziyu Zhao, Daixin Wang, Qing Cui, Zhiqiang Zhang, Jun Zhou, Fei Wu, Kun Kuang

PDF

Open Access 10 Models

TL;DR

This paper compares data mixture and model merging strategies for aligning large language models with Helpfulness, Honesty, and Harmlessness, introducing RESM, a novel merging method that improves balance and robustness.

Contribution

It systematically evaluates merging and data mixture approaches for 3H alignment and proposes RESM, a new weighted model merging technique enhancing balance and robustness.

Findings

01

RESM outperforms previous methods with 2-5% gains.

02

Model merging and data mixture have distinct advantages and limitations.

03

Extensive evaluations confirm RESM's effectiveness and robustness.

Abstract

Achieving balanced alignment of large language models (LLMs) in terms of Helpfulness, Honesty, and Harmlessness (3H optimization) constitutes a cornerstone of responsible AI. Existing methods like data mixture strategies face limitations, including heavy reliance on expert knowledge and conflicting optimization signals. While model merging offers parameter-level conflict-resolution strategies through integrating specialized models' parameters, its potential for 3H optimization remains underexplored. This paper systematically compares the effectiveness of model merging and data mixture methods in constructing 3H-aligned LLMs for the first time, revealing previously overlooked collaborative and conflict relationships among the 3H dimensions and discussing the advantages and drawbacks of data mixture (\textit{data-level}) and model merging (\textit{parameter-level}) methods in mitigating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsPruning