HBO: Hierarchical Balancing Optimization for Fine-Tuning Large Language Models

Weixuan Wang; Minghao Wu; Barry Haddow; Alexandra Birch

arXiv:2505.12300·cs.CL·February 6, 2026

HBO: Hierarchical Balancing Optimization for Fine-Tuning Large Language Models

Weixuan Wang, Minghao Wu, Barry Haddow, Alexandra Birch

PDF

1 Repo 3 Reviews

TL;DR

HBO introduces a hierarchical optimization method for fine-tuning large language models, dynamically balancing data across and within datasets to improve training effectiveness and accuracy.

Contribution

The paper presents a novel bilevel optimization approach with global and local actors for adaptive data balancing during LLM fine-tuning.

Findings

01

HBO outperforms existing baselines across multiple tasks.

02

Both global and local actors effectively adjust data usage.

03

Significant accuracy improvements are achieved with HBO.

Abstract

Fine-tuning large language models (LLMs) on a mixture of diverse datasets poses challenges due to data imbalance and heterogeneity. Existing methods often address these issues across datasets (globally) but overlook the imbalance and heterogeneity within individual datasets (locally), which limits their effectiveness. We introduce Hierarchical Balancing Optimization (HBO), a novel method that enables LLMs to autonomously adjust data allocation during fine-tuning both across datasets (globally) and within each individual dataset (locally). HBO employs a bilevel optimization strategy with two types of actors: a Global Actor, which balances data sampling across different subsets of the training mixture, and several Local Actors, which optimizes data usage within each subset based on difficulty levels. These actors are guided by reward functions derived from the LLM's training state, which…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 3

Strengths

1) HBO effectively addresses both global and local data imbalances, providing a more comprehensive solution to the challenges of fine-tuning LLMs on diverse datasets. 2) The bilevel optimization framework with Global and Local Actors allows for fine-grained control over data sampling, leading to improved model performance across various tasks. 3) Extensive experiments demonstrate HBO's strong applicability across multiple LLM backbones and tasks, consistently outperforming existing baselines a

Weaknesses

1) My main concern is the proposed method adds more computations based on MoS. The reinforcement learning framework, as well as some reward, is similar to the MoS method. This work adds more actors and the grad norm reward, more insights of this field could be added. 2) This paper primarily compares three sampling balancing methods: MoS, MultiUAT, and MultiDDS. However, many of the results are similar to uniform sampling. What is the next step of this field could be discussed.

Reviewer 02Rating 6Confidence 4

Strengths

1. The paper tackles a significant and nuanced challenge in LLM fine-tuning by explicitly addressing hierarchical data imbalance and heterogeneity (both global, across datasets, and local, within datasets), which is often overlooked by simpler methods. 2. The proposed HBO mechanism, utilizing a bilevel optimization framework with distinct global and local actors guided by rewards derived from the model's own training state, is a novel and sophisticated approach to achieve autonomous, dynamic da

Weaknesses

1. The framework introduces substantial complexity compared to standard fine-tuning or simpler dynamic sampling. Implementing and tuning the bilevel optimization setup, managing multiple actors (one global, potentially many local), and ensuring stable training with the Reinforce algorithm likely requires significant expertise and effort. 2. The reported computational overhead, while quantified (~15%), is non-negligible and could be a barrier to practical adoption. This additional runtime cost

Reviewer 03Rating 4Confidence 4

Strengths

1. The paper targets an important problem, i.e., data imbalance and heterogeneity in LLM fine-tuning,which is relevant to current multi-task and multilingual training paradigms. 2. The hierarchical bilevel optimization formulation is conceptually interesting and provides a unified framework for global and local data balancing. 3. The paper is well-written and easy to follow.

Weaknesses

I have the following concerns. *If the authors could properly address them during the rebuttal phase, I am willing to raise my score.* 1. The technical novelty is somewhat limited. While the hierarchical structure and bilevel setup are well-motivated, they mainly combine known techniques such as policy gradients and dynamic sampling into a straightforward framework, without introducing fundamentally new optimization principles. 2. This paper lacks strong theoretical or analytical justification f

Code & Models

Repositories

weixuan-wang123/hbo
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.