EMORL: Ensemble Multi-Objective Reinforcement Learning for Efficient and Flexible LLM Fine-Tuning

Lingxiao Kong; Cong Yang; Susanne Neufang; Oya Deniz Beyan; Zeyd Boukhers

arXiv:2505.02579·cs.CL·July 10, 2025

EMORL: Ensemble Multi-Objective Reinforcement Learning for Efficient and Flexible LLM Fine-Tuning

Lingxiao Kong, Cong Yang, Susanne Neufang, Oya Deniz Beyan, Zeyd Boukhers

PDF

Open Access 1 Repo

TL;DR

EMORL introduces an ensemble RL framework for LLM fine-tuning that enhances efficiency, scalability, and explainability by aggregating hidden states of multiple models optimized for different objectives.

Contribution

This paper presents the first method to aggregate hidden states from multiple models in multi-objective RL fine-tuning, improving efficiency and flexibility.

Findings

01

Significantly reduced training data and time consumption.

02

Enhanced scalability and explainability of the fine-tuning process.

03

Achieved comparable performance across multiple objectives.

Abstract

Recent advances in reinforcement learning (RL) for large language model (LLM) fine-tuning show promise in addressing multi-objective tasks but still face significant challenges, including competing objective balancing, low training efficiency, poor scalability, and limited explainability. Leveraging ensemble learning principles, we introduce an Ensemble Multi-Objective RL (EMORL) framework that fine-tunes multiple models with individual objectives while optimizing their aggregation after the fine-tuning to improve efficiency and flexibility. Our method is the first to aggregate the hidden states of individual models, incorporating contextual information from multiple objectives. This approach is supported by a hierarchical grid search algorithm that identifies optimal weighted combinations. We evaluate EMORL on counselor reflection generation tasks, using text classification models to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

engineerkong/emorl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Rights Management and Security · Iterative Learning Control Systems