Meta-Aligner: Bidirectional Preference-Policy Optimization for Multi-Objective LLMs Alignment
Wenzhe Xu, Biao Liu, Yiyang Sun, Xin Geng, Ning Xu

TL;DR
Meta-Aligner introduces a bidirectional meta-learning framework for multi-objective LLM alignment, dynamically optimizing preferences and responses to better handle conflicting human values.
Contribution
It proposes a novel meta-learning approach with a preference-weight-net for adaptive preference generation and bidirectional optimization, improving multi-objective alignment.
Findings
Achieves superior performance on multi-objective benchmarks.
Demonstrates the effectiveness of dynamic bidirectional preference-policy optimization.
Validates the approach's ability to handle conflicting human values.
Abstract
Multi-Objective Alignment aims to align Large Language Models (LLMs) with diverse and often conflicting human values by optimizing multiple objectives simultaneously. Existing methods predominantly rely on static preference weight construction strategies. However, rigidly aligning to fixed targets discards valuable intermediate information, as training responses inherently embody valid preference trade-offs even when deviating from the target. To address this limitation, we propose Meal, i.e., MEta ALigner, a bi-level meta-learning framework enabling bidirectional optimization between preferences and policy responses, generating instructive dynamic preferences for steadier training. Specifically, we introduce a preference-weight-net as a meta-learner to generate adaptive preference weights based on input prompts and update the preference weights as learnable parameters, while the LLM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
