Scalable Hyperparameter-Divergent Ensemble Training with Automatic Learning Rate Exploration for Large Models

Hailing Cheng; Tao Huang; Chen Zhu; Antonio Alonso

arXiv:2604.24708·cs.LG·April 28, 2026

Scalable Hyperparameter-Divergent Ensemble Training with Automatic Learning Rate Exploration for Large Models

Hailing Cheng, Tao Huang, Chen Zhu, Antonio Alonso

PDF

1 Repo

TL;DR

This paper introduces HDET, a scalable ensemble training method that explores learning rate configurations automatically during large model training, improving optimization and generalization without extra hyperparameter tuning.

Contribution

HDET repurposes data-parallel replicas for simultaneous learning rate exploration with a novel auto-LR controller, enabling self-adapting hyperparameter schedules during training.

Findings

01

HDET improves training efficiency and model performance.

02

The auto-LR controller adapts hyperparameters without additional tuning.

03

Framework generalizes to other scalar hyperparameters beyond learning rate.

Abstract

Training large neural networks with data-parallel stochastic gradient descent allocates N GPU replicas to compute effectively identical updates -- a practice that leaves the rich space of learning rate configurations entirely unexplored during training. We propose Hyperparameter-Divergent Ensemble Training (HDET), a method that repurposes these replicas for simultaneous learning rate exploration at negligible communication overhead. HDET operates in alternating phases: a fan-out stage in which replicas train independently under a structured, symmetric spread of learning rates, and a converge stage in which parameters are averaged across all replicas via AllReduce every T steps. Building on this ensemble substrate, we further propose an automatic learning rate (auto-LR) controller that treats the relative training loss across replicas as a performance signal, updating the shared base…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hailingc/ensemble_training
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.