Stochastic Trust-Region Methods for Over-parameterized Models

Aike Yang; Hao Wang

arXiv:2604.14017·math.OC·April 16, 2026

Stochastic Trust-Region Methods for Over-parameterized Models

Aike Yang, Hao Wang

PDF

TL;DR

This paper introduces a stochastic trust-region framework that removes manual step-size tuning and extends to constrained problems, achieving competitive convergence rates and stable optimization in deep learning tasks.

Contribution

It develops a unified stochastic trust-region method for unconstrained and constrained optimization, with theoretical convergence guarantees and practical stability improvements.

Findings

01

Achieves $O(rac{1}{ ext{ extit{varepsilon}}^2} ext{log}(1/ ext{ extit{varepsilon}}))$ complexity for unconstrained problems.

02

Achieves $O(rac{1}{ ext{ extit{varepsilon}}^4} ext{log}(1/ ext{ extit{varepsilon}}))$ complexity for constrained problems.

03

Demonstrates comparable performance to well-tuned baselines in neural network training, with stable behavior and constraint handling.

Abstract

Under interpolation-type assumptions such as the strong growth condition, stochastic optimization methods can attain convergence rates comparable to full-batch methods, but their performance, particularly for SGD, remains highly sensitive to step-size selection. To address this issue, we propose a unified stochastic trust-region framework that eliminates manual step-size tuning and extends naturally to equality-constrained problems. For unconstrained optimization, we develop a first-order stochastic trust-region algorithm and show that, under the strong growth condition, it achieves an iteration and stochastic first-order oracle complexity of $O (ε^{- 2} lo g (1/ ε))$ for finding an $ε$ -stationary point. For equality-constrained problems, we introduce a quadratic-penalty-based stochastic trust-region method with penalty parameter $μ$ , and establish an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.