Enhancing Value Alignment of LLMs with Multi-agent system and Combinatorial Fusion
Yuanhong Wu, Djallel Bouneffouf, and D. Frank Hsu

TL;DR
This paper introduces VAS-CFA, a multi-agent fusion framework that enhances LLM value alignment by integrating diverse normative perspectives, outperforming previous single-agent and aggregation methods.
Contribution
It presents a novel multi-agent fusion approach using CFA to improve LLM alignment with human values, addressing limitations of existing single-evaluator methods.
Findings
VAS-CFA outperforms single-agent baselines
Multi-agent fusion improves alignment robustness
Empirical results show better standard metric scores
Abstract
Aligning large language models (LLMs) with human values is a central challenge for ensuring trustworthy and safe deployment. While existing methods such as Reinforcement Learning from Human Feedback (RLHF) and its variants have improved alignment, they often rely on a single evaluator or narrowly defined reward signals, limiting their ability to capture ethical pluralism. In this work, we propose the Value Alignment System using Combinatorial Fusion Analysis (VAS-CFA), a framework that operationalizes multi-agent fusion alignment. It instantiates multiple moral agents, each fine-tuned to represent a distinct normative perspective, and fuses their outputs using CFA with both rank- and score-based aggregation. This design leverages cognitive diversity, between agents, to mitigate conflicts and redundancies across multiple agents, producing responses that better reflect human values.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Ethics and Social Impacts of AI
