SwiftPruner: Reinforced Evolutionary Pruning for Efficient Ad Relevance

Li Lyna Zhang; Youkow Homma; Yujing Wang; Min Wu; Mao Yang; Ruofei; Zhang; Ting Cao; Wei Shen

arXiv:2209.00625·cs.IR·September 2, 2022·1 cites

SwiftPruner: Reinforced Evolutionary Pruning for Efficient Ad Relevance

Li Lyna Zhang, Youkow Homma, Yujing Wang, Min Wu, Mao Yang, Ruofei, Zhang, Ting Cao, Wei Shen

PDF

Open Access

TL;DR

This paper introduces SwiftPruner, a reinforcement learning-based framework for structured pruning of BERT models, enabling efficient real-time ad relevance inference for cold start ads with improved accuracy and latency.

Contribution

It proposes a novel reinforcement learning approach with a latency-aware reward to optimize layer-wise sparsity in BERT for low-latency ad relevance tasks.

Findings

01

Achieves higher ROC AUC than uniform sparsity baselines.

02

Reduces cold start ad defect ratio by 11.7%.

03

Meets latency constraints with improved model performance.

Abstract

Ad relevance modeling plays a critical role in online advertising systems including Microsoft Bing. To leverage powerful transformers like BERT in this low-latency setting, many existing approaches perform ad-side computations offline. While efficient, these approaches are unable to serve cold start ads, resulting in poor relevance predictions for such ads. This work aims to design a new, low-latency BERT via structured pruning to empower real-time online inference for cold start ads relevance on a CPU platform. Our challenge is that previous methods typically prune all layers of the transformer to a high, uniform sparsity, thereby producing models which cannot achieve satisfactory inference speed with an acceptable accuracy. In this paper, we propose SwiftPruner - an efficient framework that leverages evolution-based search to automatically find the best-performing layer-wise sparse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Advanced Image and Video Retrieval Techniques · Digital Marketing and Social Media

MethodsMulti-Head Attention · Attention Is All You Need · Pruning · Linear Layer · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · WordPiece · Layer Normalization · Softmax · Linear Warmup With Linear Decay · Adam