Unified Low-rank Compression Framework for Click-through Rate Prediction

Hao Yu; Minghao Fu; Jiandong Ding; Yusheng Zhou; Jianxin Wu

arXiv:2405.18146·cs.IR·June 12, 2024

Unified Low-rank Compression Framework for Click-through Rate Prediction

Hao Yu, Minghao Fu, Jiandong Ding, Yusheng Zhou, Jianxin Wu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a unified low-rank decomposition framework for compressing CTR prediction models, reducing memory and computation costs while improving inference speed and AUC performance.

Contribution

It proposes a novel low-rank compression method that locally compresses output features, outperforming traditional tensor decomposition approaches in CTR models.

Findings

01

Achieves 3-5x model size reduction with better AUC.

02

Faster inference speeds on benchmark datasets.

03

Applicable to embedding tables and MLP layers.

Abstract

Deep Click-Through Rate (CTR) prediction models play an important role in modern industrial recommendation scenarios. However, high memory overhead and computational costs limit their deployment in resource-constrained environments. Low-rank approximation is an effective method for computer vision and natural language processing models, but its application in compressing CTR prediction models has been less explored. Due to the limited memory and computing resources, compression of CTR prediction models often confronts three fundamental challenges, i.e., (1). How to reduce the model sizes to adapt to edge devices? (2). How to speed up CTR prediction model inference? (3). How to retain the capabilities of original models after compression? Previous low-rank compression research mostly uses tensor decomposition, which can achieve a high parameter compression ratio, but brings in AUC…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuhao318/atomic_feature_mimicking
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Video Quality Assessment · Advanced Data Compression Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings