Unified Low-rank Compression Framework for Click-through Rate Prediction
Hao Yu, Minghao Fu, Jiandong Ding, Yusheng Zhou, Jianxin Wu

TL;DR
This paper introduces a unified low-rank decomposition framework for compressing CTR prediction models, reducing memory and computation costs while improving inference speed and AUC performance.
Contribution
It proposes a novel low-rank compression method that locally compresses output features, outperforming traditional tensor decomposition approaches in CTR models.
Findings
Achieves 3-5x model size reduction with better AUC.
Faster inference speeds on benchmark datasets.
Applicable to embedding tables and MLP layers.
Abstract
Deep Click-Through Rate (CTR) prediction models play an important role in modern industrial recommendation scenarios. However, high memory overhead and computational costs limit their deployment in resource-constrained environments. Low-rank approximation is an effective method for computer vision and natural language processing models, but its application in compressing CTR prediction models has been less explored. Due to the limited memory and computing resources, compression of CTR prediction models often confronts three fundamental challenges, i.e., (1). How to reduce the model sizes to adapt to edge devices? (2). How to speed up CTR prediction model inference? (3). How to retain the capabilities of original models after compression? Previous low-rank compression research mostly uses tensor decomposition, which can achieve a high parameter compression ratio, but brings in AUC…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Video Quality Assessment · Advanced Data Compression Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
