Adaptive Optimizers with Sparse Group Lasso for Neural Networks in CTR Prediction
Yun Yue, Yongchao Liu, Suo Tong, Minghao Li, Zhen Zhang, Chunyang Wen,, Huanjun Bao, Lihong Gu, Jinjie Gu, Yixiang Mu

TL;DR
This paper introduces a new class of adaptive optimizers with sparse group lasso regularization for neural networks, improving model sparsity and performance in CTR prediction tasks.
Contribution
It proposes integrating sparse group lasso regularizers into popular adaptive optimizers and provides theoretical convergence guarantees.
Findings
Significant performance improvements over original optimizers with magnitude pruning.
Achieves high sparsity levels with competitive or better accuracy.
Validated on large-scale ad click datasets with state-of-the-art models.
Abstract
We develop a novel framework that adds the regularizers of the sparse group lasso to a family of adaptive optimizers in deep learning, such as Momentum, Adagrad, Adam, AMSGrad, AdaHessian, and create a new class of optimizers, which are named Group Momentum, Group Adagrad, Group Adam, Group AMSGrad and Group AdaHessian, etc., accordingly. We establish theoretically proven convergence guarantees in the stochastic convex settings, based on primal-dual methods. We evaluate the regularized effect of our new optimizers on three large-scale real-world ad click datasets with state-of-the-art deep learning models. The experimental results reveal that compared with the original optimizers with the post-processing procedure which uses the magnitude pruning method, the performance of the models can be significantly improved on the same sparsity level. Furthermore, in comparison to the cases…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Advanced Bandit Algorithms Research
MethodsPruning · AMSGrad · ADAHESSIAN · Adam
