Efficient Dataset Selection for Continual Adaptation of Generative Recommenders
Cathy Jiao, Juan Elenter, Praveen Ravichandran, Bernd Huber, Joseph Cauteruccio, Todd Wasson, Timothy Heath, Chenyan Xiong, Mounia Lalmas, Paul Bennett

TL;DR
This paper explores efficient data selection methods to improve continual adaptation of recommendation systems, focusing on targeted sampling strategies that enhance performance while reducing training costs.
Contribution
It introduces gradient-based representations combined with distribution-matching techniques for effective data curation in scalable recommendation system updates.
Findings
Gradient-based representations improve model performance.
Distribution-matching enhances data selection effectiveness.
Targeted data selection maintains robustness to user behavior drift.
Abstract
Recommendation systems must continuously adapt to evolving user behavior, yet the volume of data generated in large-scale streaming environments makes frequent full retraining impractical. This work investigates how targeted data selection can mitigate performance degradation caused by temporal distributional drift while maintaining scalability. We evaluate a range of representation choices and sampling strategies for curating small but informative subsets of user interaction data. Our results demonstrate that gradient-based representations, coupled with distribution-matching, improve downstream model performance, achieving training efficiency gains while preserving robustness to drift. These findings highlight data curation as a practical mechanism for scalable monitoring and adaptive model updates in production-scale recommendation systems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
