Efficient LLM-based Advertising via Model Compression and Parallel Verification

Wenxin Dong; Chang Gao; Guanghui Yu; Xuewu Jiao; Mingqing Hu; Qiang Fu; Peng Xu; Penghui Wei; Hui Xu; Yue Xing; Shuanglong Li; Lin Liu

arXiv:2605.11582·cs.CL·May 13, 2026

Efficient LLM-based Advertising via Model Compression and Parallel Verification

Wenxin Dong, Chang Gao, Guanghui Yu, Xuewu Jiao, Mingqing Hu, Qiang Fu, Peng Xu, Penghui Wei, Hui Xu, Yue Xing, Shuanglong Li, Lin Liu

PDF

TL;DR

This paper introduces an efficient framework for deploying large language models in advertising by combining model compression techniques and parallel verification to reduce latency and computational costs.

Contribution

It presents a novel framework integrating adaptive group quantization, hierarchical sparsification, and parallel verification for faster LLM inference in advertising.

Findings

01

Achieves significant speedup in LLM inference for advertising tasks.

02

Maintains acceptable quality levels despite compression and acceleration.

03

Demonstrates effectiveness on real-world advertising scenarios.

Abstract

Large language models (LLMs) have shown remarkable potential in advertising scenarios such as ad creative generation and targeted advertising. However, deploying LLMs in real-time advertising systems poses significant challenges due to their high inference latency and computational cost. In this paper, we propose an Efficient Generative Targeting framework that integrates adaptive group quantization, layer-adaptive hierarchical sparsification, and prefix-tree parallel verification to accelerate LLM inference while preserving generation quality. Extensive experiments on two real-world advertising scenarios demonstrate that our framework achieves significant speedup with acceptable quality degradation, making it operationally viable for practical deployments.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.