Efficient Hyperparameter Search for Non-Stationary Model Training
Berivan Isik, Matthew Fahrbach, Dima Kuzmin, Nicolas Mayoraz, Emil Praun, Steffen Rendle, Raghavendra Vasudeva

TL;DR
This paper presents a two-stage hyperparameter search method tailored for non-stationary online learning systems, significantly reducing search costs while maintaining model performance.
Contribution
The authors introduce a novel two-stage framework with data reduction and prediction strategies specifically designed for non-stationary data, improving hyperparameter search efficiency.
Findings
Up to 10× reduction in hyperparameter search cost on Criteo dataset
Validated efficiency gains in large-scale industrial advertising system
Effective identification of promising configurations in non-stationary environments
Abstract
Online learning is the cornerstone of applications like recommendation and advertising systems, where models continuously adapt to shifting data distributions. Model training for such systems is remarkably expensive, a cost that multiplies during hyperparameter search. We introduce a two-stage paradigm to reduce this cost: (1) efficiently identifying the most promising configurations, and then (2) training only these selected candidates to their full potential. Our core insight is that focusing on accurate identification in the first stage, rather than achieving peak performance, allows for aggressive cost-saving measures. We develop novel data reduction and prediction strategies that specifically overcome the challenges of sequential, non-stationary data not addressed by conventional hyperparameter optimization. We validate our framework's effectiveness through a dual evaluation: first…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Domain Adaptation and Few-Shot Learning · Data Stream Mining Techniques
