Efficient Retrieval Scaling with Hierarchical Indexing for Large Scale Recommendation
Dongqi Fu, Kaushik Rangadurai, Haiyu Lu, Yunchen Pu, Siyang Yuan, Minhui Huang, Yiqun Liu, Golnaz Ghasemiesfeh, Xingfeng He, Fangzhou Xu, Andrew Cui, Vidhoon Viswanathan, Lin Yang, Liang Wang, Jiyan Yang, Chonglin Sun

TL;DR
This paper introduces a hierarchical indexing method for large-scale retrieval models, enabling more efficient search while maintaining accuracy, and demonstrates its deployment in Meta's recommendation systems.
Contribution
It proposes a joint learning approach for hierarchical indexes using cross-attention and residual quantization, improving retrieval efficiency and inference performance.
Findings
Hierarchical index nodes correspond to high-quality data subsets.
Fine-tuning on these subsets enhances inference performance.
The method is successfully deployed at Meta for billions of users.
Abstract
The increase in data volume, computational resources, and model parameters during training has led to the development of numerous large-scale industrial retrieval models for recommendation tasks. However, effectively and efficiently deploying these large-scale foundational retrieval models remains a critical challenge that has not been fully addressed. Common quick-win solutions for deploying these massive models include relying on offline computations (such as cached user dictionaries) or distilling large models into smaller ones. Yet, both approaches fall short of fully leveraging the representational and inference capabilities of foundational models. In this paper, we explore whether it is possible to learn a hierarchical organization over the memory of foundational retrieval models. Such a hierarchical structure would enable more efficient search by reducing retrieval costs while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
