Near-Zero-Overhead Freshness for Recommendation Systems via Inference-Side Model Updates
Wenjun Yu, Sitian Chen, Cheng Chen, and Amelie Chi Zhou

TL;DR
This paper introduces LiveUpdate, a system that enables near-zero-overhead, real-time model updates for recommendation systems by colocating low-rank trainers with inference nodes, significantly improving freshness and accuracy.
Contribution
LiveUpdate leverages low-rank structure and resource scheduling to eliminate synchronization overhead, enabling continuous online model updates in recommendation systems.
Findings
Reduces update costs by 2x compared to delta-update baselines.
Achieves higher recommendation accuracy within 1-hour update windows.
Transforms idle inference resources into real-time freshness engines.
Abstract
Deep Learning Recommendation Models (DLRMs) underpin personalized services but face a critical freshness-accuracy tradeoff due to massive parameter synchronization overheads. Production DLRMs deploy decoupled training/inference clusters, where synchronizing petabyte-scale embedding tables (EMTs) causes multi-minute staleness, degrading recommendation quality and revenue. We observe that (1) inference nodes exhibit sustained CPU underutilization (peak <= 20%), and (2) EMT gradients possess intrinsic low-rank structure, enabling compact update representation. We present LiveUpdate, a system that eliminates inter-cluster synchronization by colocating Low-Rank Adaptation (LoRA) trainers within inference nodes. LiveUpdate addresses two core challenges: (1) dynamic rank adaptation via singular value monitoring to constrain memory overhead (<2% of EMTs), and (2) NUMA-aware resource scheduling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
