A Saddle Point Remedy: Power of Variable Elimination in Non-convex Optimization
Min Gan, Guang-Yong Chen, Yang Yi, Lin Yang

TL;DR
This paper demonstrates how variable elimination reshapes the optimization landscape in non-convex problems, transforming saddle points into local maxima, thereby improving convergence and stability in machine learning models.
Contribution
It provides a rigorous geometric analysis of variable elimination, showing how it simplifies the landscape by converting saddle points into local maxima, and validates this in neural network training.
Findings
Variable elimination transforms saddle points into local maxima.
The approach improves stability and convergence in deep residual networks.
Landscape simplification guides the design of robust optimization algorithms.
Abstract
The proliferation of saddle points, rather than poor local minima, is increasingly understood to be a primary obstacle in large-scale non-convex optimization for machine learning. Variable elimination algorithms, like Variable Projection (VarPro), have long been observed to exhibit superior convergence and robustness in practice, yet a principled understanding of why they so effectively navigate these complex energy landscapes has remained elusive. In this work, we provide a rigorous geometric explanation by comparing the optimization landscapes of the original and reduced formulations. Through a rigorous analysis based on Hessian inertia and the Schur complement, we prove that variable elimination fundamentally reshapes the critical point structure of the objective function, revealing that local maxima in the reduced landscape are created from, and correspond directly to, saddle points…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Tensor decomposition and applications
