Near-Zero-Overhead Freshness for Recommendation Systems via Inference-Side Model Updates

Wenjun Yu; Sitian Chen; Cheng Chen; and Amelie Chi Zhou

arXiv:2512.12295·cs.DC·December 18, 2025

Near-Zero-Overhead Freshness for Recommendation Systems via Inference-Side Model Updates

Wenjun Yu, Sitian Chen, Cheng Chen, and Amelie Chi Zhou

PDF

Open Access

TL;DR

This paper introduces LiveUpdate, a system that enables near-zero-overhead, real-time model updates for recommendation systems by colocating low-rank trainers with inference nodes, significantly improving freshness and accuracy.

Contribution

LiveUpdate leverages low-rank structure and resource scheduling to eliminate synchronization overhead, enabling continuous online model updates in recommendation systems.

Findings

01

Reduces update costs by 2x compared to delta-update baselines.

02

Achieves higher recommendation accuracy within 1-hour update windows.

03

Transforms idle inference resources into real-time freshness engines.

Abstract

Deep Learning Recommendation Models (DLRMs) underpin personalized services but face a critical freshness-accuracy tradeoff due to massive parameter synchronization overheads. Production DLRMs deploy decoupled training/inference clusters, where synchronizing petabyte-scale embedding tables (EMTs) causes multi-minute staleness, degrading recommendation quality and revenue. We observe that (1) inference nodes exhibit sustained CPU underutilization (peak <= 20%), and (2) EMT gradients possess intrinsic low-rank structure, enabling compact update representation. We present LiveUpdate, a system that eliminates inter-cluster synchronization by colocating Low-Rank Adaptation (LoRA) trainers within inference nodes. LiveUpdate addresses two core challenges: (1) dynamic rank adaptation via singular value monitoring to constrain memory overhead (<2% of EMTs), and (2) NUMA-aware resource scheduling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications