Learning Piece-wise Linear Models from Large Scale Data for Ad Click Prediction
Kun Gai, Xiaoqiang Zhu, Han Li, Kai Liu, Zhe Wang

TL;DR
This paper presents LS-PLM, a scalable piece-wise linear model for CTR prediction that captures nonlinear patterns in large-scale sparse data, outperforming traditional feature engineering approaches.
Contribution
The paper introduces a novel large-scale piece-wise linear model with an efficient optimization algorithm and a distributed system, enabling industrial-scale CTR prediction without heavy feature engineering.
Findings
LS-PLM effectively captures nonlinear patterns in massive sparse data.
The model is scalable and suitable for industrial deployment.
Since 2012, LS-PLM has been the main CTR prediction model in Alibaba.
Abstract
CTR prediction in real-world business is a difficult machine learning problem with large scale nonlinear sparse data. In this paper, we introduce an industrial strength solution with model named Large Scale Piece-wise Linear Model (LS-PLM). We formulate the learning problem with and regularizers, leading to a non-convex and non-smooth optimization problem. Then, we propose a novel algorithm to solve it efficiently, based on directional derivatives and quasi-Newton method. In addition, we design a distributed system which can run on hundreds of machines parallel and provides us with the industrial scalability. LS-PLM model can capture nonlinear patterns from massive sparse data, saving us from heavy feature engineering jobs. Since 2012, LS-PLM has become the main CTR prediction model in Alibaba's online display advertising system, serving hundreds of millions users every…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Image Retrieval and Classification Techniques · Imbalanced Data Classification Techniques
