Enhancing Value Alignment of LLMs with Multi-agent system and Combinatorial Fusion

Yuanhong Wu; Djallel Bouneffouf; and D. Frank Hsu

arXiv:2603.11126·cs.MA·March 13, 2026

Enhancing Value Alignment of LLMs with Multi-agent system and Combinatorial Fusion

Yuanhong Wu, Djallel Bouneffouf, and D. Frank Hsu

PDF

Open Access

TL;DR

This paper introduces VAS-CFA, a multi-agent fusion framework that enhances LLM value alignment by integrating diverse normative perspectives, outperforming previous single-agent and aggregation methods.

Contribution

It presents a novel multi-agent fusion approach using CFA to improve LLM alignment with human values, addressing limitations of existing single-evaluator methods.

Findings

01

VAS-CFA outperforms single-agent baselines

02

Multi-agent fusion improves alignment robustness

03

Empirical results show better standard metric scores

Abstract

Aligning large language models (LLMs) with human values is a central challenge for ensuring trustworthy and safe deployment. While existing methods such as Reinforcement Learning from Human Feedback (RLHF) and its variants have improved alignment, they often rely on a single evaluator or narrowly defined reward signals, limiting their ability to capture ethical pluralism. In this work, we propose the Value Alignment System using Combinatorial Fusion Analysis (VAS-CFA), a framework that operationalizes multi-agent fusion alignment. It instantiates multiple moral agents, each fine-tuned to represent a distinct normative perspective, and fuses their outputs using CFA with both rank- and score-based aggregation. This design leverages cognitive diversity, between agents, to mitigate conflicts and redundancies across multiple agents, producing responses that better reflect human values.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Ethics and Social Impacts of AI