Faster Distributed Inference-Only Recommender Systems via Bounded Lag Synchronous Collectives

Kiril Dichev; Filip Pawlowski; Albert-Jan Yzelman

arXiv:2512.19342·cs.DC·December 23, 2025

Faster Distributed Inference-Only Recommender Systems via Bounded Lag Synchronous Collectives

Kiril Dichev, Filip Pawlowski, Albert-Jan Yzelman

PDF

Open Access

TL;DR

This paper introduces a bounded lag synchronous alltoallv communication method for distributed recommender systems, improving inference latency and throughput in unbalanced or irregular access scenarios by allowing controlled process lagging.

Contribution

It proposes a novel BLS alltoallv operation that reduces synchronization overhead in distributed DLRMs, especially effective in unbalanced or irregular access conditions.

Findings

01

Improves latency and throughput in unbalanced DLRM runs

02

Masks process delays in inference-only scenarios

03

No notable advantage in well-balanced runs

Abstract

Recommender systems are enablers of personalized content delivery, and therefore revenue, for many large companies. In the last decade, deep learning recommender models (DLRMs) are the de-facto standard in this field. The main bottleneck in DLRM inference is the lookup of sparse features across huge embedding tables, which are usually partitioned across the aggregate RAM of many nodes. In state-of-the-art recommender systems, the distributed lookup is implemented via irregular all-to-all (alltoallv) communication, and often presents the main bottleneck. Today, most related work sees this operation as a given; in addition, every collective is synchronous in nature. In this work, we propose a novel bounded lag synchronous (BLS) version of the alltoallv operation. The bound can be a parameter allowing slower processes to lag behind entire iterations before the fastest processes block. In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Caching and Content Delivery · Machine Learning in Healthcare