# On the Optimization Landscape of Tensor Decompositions

**Authors:** Rong Ge, Tengyu Ma

arXiv: 1706.05598 · 2017-06-20

## TL;DR

This paper analyzes the optimization landscape of over-complete tensor decomposition, showing that near-global maxima are prevalent among local optima, which explains the success of gradient ascent in solving these NP-hard problems.

## Contribution

It provides the first analysis using Kac-Rice formula on the local minima of structured random polynomials in tensor decomposition, establishing conditions for near-global optimality.

## Key findings

- All local maxima with function value close to the expectation are approximate global maxima.
- Gradient ascent can reliably solve the tensor decomposition problem from barely better than random initialization.
- First application of Kac-Rice formula to structured random polynomials with dependent coefficients.

## Abstract

Non-convex optimization with local search heuristics has been widely used in machine learning, achieving many state-of-art results. It becomes increasingly important to understand why they can work for these NP-hard problems on typical data. The landscape of many objective functions in learning has been conjectured to have the geometric property that "all local optima are (approximately) global optima", and thus they can be solved efficiently by local search algorithms. However, establishing such property can be very difficult.   In this paper, we analyze the optimization landscape of the random over-complete tensor decomposition problem, which has many applications in unsupervised learning, especially in learning latent variable models. In practice, it can be efficiently solved by gradient ascent on a non-convex objective. We show that for any small constant $\epsilon > 0$, among the set of points with function values $(1+\epsilon)$-factor larger than the expectation of the function, all the local maxima are approximate global maxima. Previously, the best-known result only characterizes the geometry in small neighborhoods around the true components. Our result implies that even with an initialization that is barely better than the random guess, the gradient ascent algorithm is guaranteed to solve this problem.   Our main technique uses Kac-Rice formula and random matrix theory. To our best knowledge, this is the first time when Kac-Rice formula is successfully applied to counting the number of local minima of a highly-structured random polynomial with dependent coefficients.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1706.05598/full.md

## References

51 references — full list in the complete paper: https://tomesphere.com/paper/1706.05598/full.md

---
Source: https://tomesphere.com/paper/1706.05598