MACPO: Weak-to-Strong Alignment via Multi-Agent Contrastive Preference   Optimization

Yougang Lyu; Lingyong Yan; Zihan Wang; Dawei Yin; Pengjie Ren; Maarten; de Rijke; Zhaochun Ren

arXiv:2410.07672·cs.CL·March 4, 2025

MACPO: Weak-to-Strong Alignment via Multi-Agent Contrastive Preference Optimization

Yougang Lyu, Lingyong Yan, Zihan Wang, Dawei Yin, Pengjie Ren, Maarten, de Rijke, Zhaochun Ren

PDF

Open Access

TL;DR

MACPO introduces a multi-agent contrastive preference optimization framework that enhances weak-to-strong alignment of large language models by mutual learning and behavior augmentation, outperforming existing methods on key datasets.

Contribution

The paper proposes a novel MACPO framework for weak-to-strong LLM alignment, addressing a gap in current methods by enabling mutual learning between weak teachers and strong students.

Findings

01

MACPO improves alignment performance on HH-RLHF and PKU-SafeRLHF datasets.

02

Mutual positive behavior augmentation enhances learning between agents.

03

More weak teachers lead to better alignment with increased iterations.

Abstract

As large language models (LLMs) are rapidly advancing and achieving near-human capabilities on specific tasks, aligning them with human values is becoming more urgent. In scenarios where LLMs outperform humans, we face a weak-to-strong alignment problem where we need to effectively align strong student LLMs through weak supervision generated by weak teachers. Existing alignment methods mainly focus on strong-to-weak alignment and self-alignment settings, and it is impractical to adapt them to the much harder weak-to-strong alignment setting. To fill this gap, we propose a multi-agent contrastive preference optimization (MACPO) framework. MACPO facilitates weak teachers and strong students to learn from each other by iteratively reinforcing unfamiliar positive behaviors while penalizing familiar negative ones. To get this, we devise a mutual positive behavior augmentation strategy to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Manufacturing and Logistics Optimization

MethodsALIGN · Focus