A Non-Monotone Preconditioned Trust-Region Method for Neural Network Training

Andrea Angino; Bindi \c{C}apriqi; Shega Likaj; Ken Trotti; Rolf Krause

arXiv:2605.14860·math.OC·May 15, 2026

A Non-Monotone Preconditioned Trust-Region Method for Neural Network Training

Andrea Angino, Bindi \c{C}apriqi, Shega Likaj, Ken Trotti, Rolf Krause

PDF

TL;DR

This paper introduces NAPTS, a non-monotone trust-region method for neural network training that improves efficiency by reducing CPU time and rejected steps through a novel preconditioning strategy.

Contribution

It proposes a non-monotone variant of APTS with a nonlinear additive Schwarz preconditioner, enhancing parallel training efficiency.

Findings

01

Reduces CPU time by 30% compared to APTS.

02

Cuts rejected steps to one third of those in APTS.

03

Maintains training accuracy with improved efficiency.

Abstract

Training deep neural networks at scale can benefit from domain decomposition, where the network is split into subdomains trained in parallel and coupled by a global trust-region mechanism. Building on the Additively Preconditioned Trust-Region Strategy (APTS), we propose a non-monotone variant with a nonlinear additive Schwarz preconditioner that combines parallel subdomain corrections with global coarse-space directions. A windowed acceptance criterion allows controlled objective increases, avoiding needless rejection of effective coarse steps. The resulting non-monotone APTS (NAPTS) preserves accuracy while reducing CPU time by 30\% and cutting rejected steps to one third of those in APTS.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.