A Non-Monotone Preconditioned Trust-Region Method for Neural Network Training
Andrea Angino, Bindi \c{C}apriqi, Shega Likaj, Ken Trotti, Rolf Krause

TL;DR
This paper introduces NAPTS, a non-monotone trust-region method for neural network training that improves efficiency by reducing CPU time and rejected steps through a novel preconditioning strategy.
Contribution
It proposes a non-monotone variant of APTS with a nonlinear additive Schwarz preconditioner, enhancing parallel training efficiency.
Findings
Reduces CPU time by 30% compared to APTS.
Cuts rejected steps to one third of those in APTS.
Maintains training accuracy with improved efficiency.
Abstract
Training deep neural networks at scale can benefit from domain decomposition, where the network is split into subdomains trained in parallel and coupled by a global trust-region mechanism. Building on the Additively Preconditioned Trust-Region Strategy (APTS), we propose a non-monotone variant with a nonlinear additive Schwarz preconditioner that combines parallel subdomain corrections with global coarse-space directions. A windowed acceptance criterion allows controlled objective increases, avoiding needless rejection of effective coarse steps. The resulting non-monotone APTS (NAPTS) preserves accuracy while reducing CPU time by 30\% and cutting rejected steps to one third of those in APTS.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
