MidPO: Dual Preference Optimization for Safety and Helpfulness in Large Language Models via a Mixture of Experts Framework

Yupeng Qi; Ziyu Lyu; Min Yang; Yanlin Wang; Lu Bai; Lixin Cui

arXiv:2506.02460·cs.CL·June 4, 2025

MidPO: Dual Preference Optimization for Safety and Helpfulness in Large Language Models via a Mixture of Experts Framework

Yupeng Qi, Ziyu Lyu, Min Yang, Yanlin Wang, Lu Bai, Lixin Cui

PDF

Open Access

TL;DR

MidPO introduces a Mixture of Experts framework with dynamic routing to balance safety and helpfulness in large language models, outperforming existing methods in both aspects.

Contribution

The paper presents a novel MoE-based approach with a dynamic routing mechanism for dual preference optimization in LLMs, balancing safety and helpfulness.

Findings

01

MidPO significantly outperforms state-of-the-art methods in safety.

02

MidPO achieves better helpfulness without compromising safety.

03

Experimental results on three datasets validate the effectiveness of MidPO.

Abstract

As large language models (LLMs) are increasingly applied across various domains, enhancing safety while maintaining the helpfulness of LLMs has become a critical challenge. Recent studies solve this problem through safety-constrained online preference optimization or safety-constrained offline preference optimization. However, the safety-constrained online methods often suffer from excessive safety, which might reduce helpfulness, while the safety-constrained offline methods perform poorly in adaptively balancing safety and helpfulness. To address these limitations, we propose MidPO, a \textbf{\underline{Mi}}xture of Experts (MoE) framework for safety-helpfulness \textbf{\underline{d}}ual \textbf{\underline{P}}reference \textbf{\underline{O}}ptimization. Firstly, MidPO devises single-preference enhanced direct preference optimization approach to transform the base model into two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques