Less is More: Convergence Benefits of Fewer Data Weight Updates over Longer Horizon
Rudrajit Das, Neel Patel, Meisam Razaviyayn, and Vahab Mirrokni

TL;DR
This paper analyzes the convergence of data mixing in bilevel optimization for training robust models, revealing that using a single inner update often fails and that optimal inner steps scale logarithmically with the total update budget.
Contribution
The paper provides a rigorous theoretical analysis of the convergence behavior of data mixing with finite inner steps, establishing optimal scaling laws for the number of inner updates.
Findings
Using a single inner update ($T=1$) can fail in simple cases.
Optimal number of inner steps scales as $ heta( ext{log } N)$ with total update budget.
Theoretical results are supported by proof-of-concept experiments.
Abstract
Data mixing--the strategic reweighting of training domains--is a critical component in training robust machine learning models. This problem is naturally formulated as a bilevel optimization task, where the outer loop optimizes domain weights to minimize validation loss, and the inner loop optimizes model parameters to minimize the weighted training loss. Classical bilevel optimization relies on hypergradients, which theoretically require the inner optimization to reach convergence. However, due to computational constraints, state-of-the-art methods use a finite, often small, number of inner update steps before updating the weights. The theoretical implications of this approximation are not well understood. In this work, we rigorously analyze the convergence behavior of data mixing with a finite number of inner steps . We prove that the "greedy" practical approach of using can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and Algorithms · Advanced Bandit Algorithms Research
