AIF: Asynchronous Inference Framework for Cost-Effective Pre-Ranking
Zhi Kou, Xiang-Rong Sheng, Shuguang Han, Zhishan Zhao, Yueyao Cheng, Han Zhu, Jian Xu, Bo Zheng

TL;DR
AIF is an asynchronous inference framework designed to improve the efficiency and reduce latency in pre-ranking models for recommendation systems by decoupling and parallelizing computations.
Contribution
The paper introduces AIF, a novel asynchronous architecture that reorganizes inference to decouple interaction-independent components, enhancing efficiency and enabling better model design.
Findings
Significant latency reduction in pre-ranking tasks.
Improved computational efficiency allowing richer feature sets.
Successful deployment in Taobao's advertising system.
Abstract
In industrial recommendation systems, pre-ranking models based on deep neural networks (DNNs) commonly adopt a sequential execution framework: feature fetching and model forward computation are triggered only after receiving candidates from the upstream retrieval stage. This design introduces inherent bottlenecks, including redundant computations of identical users/items and increased latency due to strictly sequential operations, which jointly constrain the model's capacity and system efficiency. To address these limitations, we propose the Asynchronous Inference Framework (AIF), a cost-effective computational architecture that decouples interaction-independent components, those operating within a single user or item, from real-time prediction. AIF reorganizes the model inference process by performing user-side computations in parallel with the retrieval stage and conducting item-side…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Explainable Artificial Intelligence (XAI) · Sentiment Analysis and Opinion Mining
