Empirical study of PROXTONE and PROXTONE$^+$ for Fast Learning of Large   Scale Sparse Models

Ziqiang Shi; Rujie Liu

arXiv:1604.05024·cs.LG·April 19, 2016

Empirical study of PROXTONE and PROXTONE$^+$ for Fast Learning of Large Scale Sparse Models

Ziqiang Shi, Rujie Liu

PDF

Open Access

TL;DR

This paper introduces PROXTONE$^+$, a hybrid optimization method combining PROXTONE and first-order techniques to efficiently train large-scale sparse neural networks, achieving faster convergence and significant model size reduction.

Contribution

It proposes a novel hybrid training approach, PROXTONE$^+$, that accelerates convergence and enhances sparsity in large-scale neural network training.

Findings

01

PROXTONE and PROXTONE$^+$ double convergence speed.

02

PROXTONE$^+$ reduces model size to 0.5%.

03

Both methods outperform traditional first-order methods.

Abstract

PROXTONE is a novel and fast method for optimization of large scale non-smooth convex problem \cite{shi2015large}. In this work, we try to use PROXTONE method in solving large scale \emph{non-smooth non-convex} problems, for example training of sparse deep neural network (sparse DNN) or sparse convolutional neural network (sparse CNN) for embedded or mobile device. PROXTONE converges much faster than first order methods, while first order method is easy in deriving and controlling the sparseness of the solutions. Thus in some applications, in order to train sparse models fast, we propose to combine the merits of both methods, that is we use PROXTONE in the first several epochs to reach the neighborhood of an optimal solution, and then use the first order method to explore the possibility of sparsity in the following training. We call such method PROXTONE plus (PROXTONE $^{+}$ ). Both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings