Multi-Value Alignment for LLMs via Value Decorrelation and Extrapolation

Hefei Xu; Le Wu; Chen Cheng; Hao Liu

arXiv:2511.17579·cs.LG·November 25, 2025

Multi-Value Alignment for LLMs via Value Decorrelation and Extrapolation

Hefei Xu, Le Wu, Chen Cheng, Hao Liu

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel multi-value alignment framework for large language models that minimizes value interference and uses extrapolation to explore diverse value trade-offs, improving alignment with human values.

Contribution

The paper proposes a new multi-value alignment method that reduces parameter interference and employs value extrapolation to better balance conflicting human values.

Findings

01

Outperforms existing methods in multi-value alignment tasks

02

Effectively handles conflicting human values

03

Constructs diverse LLMs with optimized value trade-offs

Abstract

With the rapid advancement of large language models (LLMs), aligning them with human values for safety and ethics has become a critical challenge. This problem is especially challenging when multiple, potentially conflicting human values must be considered and balanced. Although several variants of existing alignment methods (such as Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO)) have been proposed to address multi-value alignment, they suffer from notable limitations: 1) they are often unstable and inefficient in multi-value optimization; and 2) they fail to effectively handle value conflicts. As a result, these approaches typically struggle to achieve optimal trade-offs when aligning multiple values. To address this challenge, we propose a novel framework called Multi-Value Alignment (MVA). It mitigates alignment degradation caused by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Multi-Value Alignment for LLMs via Value Decorrelation and Extrapolation· underline

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI · Mobile Crowdsensing and Crowdsourcing