VC-Soup: Value-Consistency Guided Multi-Value Alignment for Large Language Models
Hefei Xu, Le Wu, Yu Wang, Min Hou, Han Wu, Zhen Zhang, Meng Wang

TL;DR
VC-soup introduces a value consistency-based data filtering and model merging approach to improve multi-value alignment in large language models, addressing conflicts and efficiency issues.
Contribution
It proposes a novel framework leveraging value consistency metrics and Pareto filtering to enhance multi-value alignment without training separate models.
Findings
Outperforms existing multi-value alignment methods in experiments.
Effectively mitigates conflicts among diverse human values.
Produces balanced multi-value performance through Pareto filtering.
Abstract
As large language models (LLMs) increasingly shape content generation, interaction, and decision-making across the Web, aligning them with human values has become a central objective in trustworthy AI. This challenge becomes even more pronounced when aligning multiple, potentially conflicting human values. Although recent approaches, such as reward reweighting, prompt-based supervised fine-tuning, and model merging, attempt to tackle multi-value alignment, they still face two major limitations: (1) training separate models for each value combination is prohibitively expensive; (2) value conflicts substantially degrade alignment performance. These limitations make it difficult to achieve favorable trade-offs across diverse human values. To address these challenges, we revisit multi-value alignment from the perspective of value consistency in data and propose VC-soup, a data filtering and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
