EvoPref: Multi-Objective Evolutionary Optimization Discovers Diverse LLM Alignments Beyond Gradient Descent
Dongxin Guo, Jikun Wu, Siu Ming Yiu

TL;DR
EvoPref uses multi-objective evolutionary algorithms to discover more diverse and robust LLM alignments than traditional gradient-based methods, addressing preference collapse.
Contribution
This work demonstrates that population-based evolutionary methods outperform gradient descent in maintaining diverse LLM preferences, with theoretical and empirical validation.
Findings
EvoPref improves preference coverage by 18% over gradient methods.
EvoPref reduces collapse rates by 47%.
EvoPref achieves competitive alignment quality.
Abstract
Gradient-based preference optimization methods for large language model (LLM) alignment suffer from preference collapse, converging to narrow behavioral modes while neglecting preference diversity. We introduce EvoPref, a multi-objective evolutionary algorithm that maintains populations of Low-Rank Adaptation (LoRA) adapters optimized across helpfulness, harmlessness, and honesty objectives using Non-dominated Sorting Genetic Algorithm II (NSGA-II) selection with archive-based diversity preservation. Our primary contribution is demonstrating that population-based methods discover substantially more diverse alignments than gradient descent. On standard benchmarks, EvoPref improves preference coverage by 18% (median 82.5% vs. 70.0% for ORPO, , Wilcoxon, ) and reduces collapse rates by 47% (11.0% vs. 20.6%, ), while achieving competitive alignment quality (median…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
