One Model for All: Multi-Objective Controllable Language Models
Qiang He, Yucheng Yang, Tianyi Zhou, Meng Fang, Mykola Pechenizkiy, Setareh Maghsudi

TL;DR
This paper introduces Multi-Objective Control (MOC), a method to train a single large language model that can generate responses aligned with diverse user preferences across multiple objectives, efficiently and effectively.
Contribution
The paper presents a novel multi-objective optimization approach integrated with RLHF to enable a single LLM to produce personalized outputs along the Pareto front, improving controllability and diversity.
Findings
MOC outperforms baselines in controlling output preferences.
It enhances the quality and diversity of generated responses.
The method generalizes well to unseen user preferences.
Abstract
Aligning large language models (LLMs) with human preferences is critical for enhancing LLMs' safety, helpfulness, humor, faithfulness, etc. Current reinforcement learning from human feedback (RLHF) mainly focuses on a fixed reward learned from average human ratings, which may weaken the adaptability and controllability of varying preferences. However, creating personalized LLMs requires aligning LLMs with individual human preferences, which is non-trivial due to the scarce data per user and the diversity of user preferences in multi-objective trade-offs, varying from emphasizing empathy in certain contexts to demanding efficiency and precision in others. Can we train one LLM to produce personalized outputs across different user preferences on the Pareto front? In this paper, we introduce Multi-Objective Control (MOC), which trains a single LLM to directly generate responses in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
