Nonconvex Nonsmooth Multicomposite Optimization and Its Applications to Recurrent Neural Networks

Lingzi Jin; Xiao Wang; Xiaojun Chen

arXiv:2506.17884·math.OC·March 13, 2026·SIAM J. Optim.

Nonconvex Nonsmooth Multicomposite Optimization and Its Applications to Recurrent Neural Networks

Lingzi Jin, Xiao Wang, Xiaojun Chen

PDF

TL;DR

This paper develops a theoretical framework for nonconvex nonsmooth multicomposite optimization problems, deriving conditions for stationary points and applying these results to improve RNN training.

Contribution

It introduces a novel approach to characterize stationary points in complex nonconvex nonsmooth optimization, with applications to recurrent neural networks.

Findings

01

Derived closed-form tangent cone expression for feasible region.

02

Established equivalence between reformulations in terms of optimality and stationarity.

03

Applied theoretical results to enhance RNN training methods.

Abstract

We consider a class of nonconvex nonsmooth multicomposite optimization problems where the objective function consists of a Tikhonov regularizer and a composition of multiple nonconvex nonsmooth component functions. Such optimization problems arise from tangible applications in machine learning and beyond. To define and compute its first-order and second-order d(irectional)-stationary points effectively, we first derive the closed-form expression of the tangent cone for the feasible region of its constrained reformulation. Building on this, we establish its equivalence with the corresponding constrained and $ℓ_{1}$ -penalty reformulations in terms of global optimality and d-stationarity. The equivalence offers indirect methods to attain the first-order and second-order d-stationary points of the original problem in certain cases. We apply our results to the training process of recurrent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.