Multi-Objective Alignment of Large Language Models Through Hypervolume Maximization
Subhojyoti Mukherjee, Anusha Lalitha, Sailik Sengupta, Aniket, Deshmukh, Branislav Kveton

TL;DR
This paper introduces HaM, an efficient algorithm for multi-objective alignment of large language models that maximizes hypervolume to produce diverse, high-quality solutions covering complex human preferences.
Contribution
It presents the first application of a-posteriori multi-objective optimization to human feedback in LLMs, enabling diverse solutions without prior preference knowledge.
Findings
HaM outperforms existing methods in multiple objectives
It efficiently covers the Pareto front of preferences
Empirical results show improvements in harmlessness, helpfulness, and faithfulness
Abstract
Multi-objective alignment from human feedback (MOAHF) in large language models (LLMs) is a challenging problem as human preferences are complex, multifaceted, and often conflicting. Recent works on MOAHF considered a-priori multi-objective optimization (MOO), where human preferences are known at training or inference time. In contrast, when human preferences are unknown or difficult to quantify, a natural approach is to cover the Pareto front by multiple diverse solutions. We propose an algorithm HaM for learning diverse LLM policies that maximizes their hypervolume. This is the first application of a-posteriori MOO to MOAHF. HaM is computationally and space efficient, and empirically superior across objectives such as harmlessness, helpfulness, humor, faithfulness, and hallucination, on various datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Machine Learning and Data Classification
