Cache-Aware Reinforcement Learning in Large-Scale Recommender Systems
Xiaoshuang Chen, Gengrui Zhang, Yao Wang, Yulin Wu, Shuo Su, Kaiqiao, Zhan, Ben Wang

TL;DR
This paper introduces a cache-aware reinforcement learning method for large-scale recommender systems that optimizes recommendations by balancing real-time computation and cache usage, significantly improving user engagement.
Contribution
It proposes a novel reinforcement learning framework that jointly optimizes real-time and cached recommendations, addressing cache-induced critic dependency issues.
Findings
CARL improves user engagement in large-scale systems.
It is deployed in Kwai app serving over 100 million users.
The eigenfunction learning method effectively mitigates critic dependency.
Abstract
Modern large-scale recommender systems are built upon computation-intensive infrastructure and usually suffer from a huge difference in traffic between peak and off-peak periods. In peak periods, it is challenging to perform real-time computation for each request due to the limited budget of computational resources. The recommendation with a cache is a solution to this problem, where a user-wise result cache is used to provide recommendations when the recommender system cannot afford a real-time computation. However, the cached recommendations are usually suboptimal compared to real-time computation, and it is challenging to determine the items in the cache for each user. In this paper, we provide a cache-aware reinforcement learning (CARL) method to jointly optimize the recommendation by real-time computation and by the cache. We formulate the problem as a Markov decision process with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Caching and Content Delivery · Advanced Bandit Algorithms Research
