Drifter: Efficient Online Feature Monitoring for Improved Data Integrity in Large-Scale Recommendation Systems
Bla\v{z} \v{S}krlj, Nir Ki-Tov, Lee Edelist, Natalia Silberstein, Hila, Weisman-Zohar, Bla\v{z} Mramor, Davorin Kopi\v{c}, Naama Ziporin

TL;DR
Drifter is a lightweight, scalable online feature monitoring system that enhances data quality and system reliability in large-scale recommendation platforms through real-time detection and root cause analysis.
Contribution
It introduces a novel, resource-efficient system combining online feature ranking and anomaly detection for real-time data quality monitoring in recommendation systems.
Findings
Effective in alerting and mitigating data quality issues
Requires minimal resources, only two threads and under 1GB RAM
Significantly improves system reliability and performance
Abstract
Real-world production systems often grapple with maintaining data quality in large-scale, dynamic streams. We introduce Drifter, an efficient and lightweight system for online feature monitoring and verification in recommendation use cases. Drifter addresses limitations of existing methods by delivering agile, responsive, and adaptable data quality monitoring, enabling real-time root cause analysis, drift detection and insights into problematic production events. Integrating state-of-the-art online feature ranking for sparse data and anomaly detection ideas, Drifter is highly scalable and resource-efficient, requiring only two threads and less than a gigabyte of RAM per production deployments that handle millions of instances per minute. Evaluation on real-world data sets demonstrates Drifter's effectiveness in alerting and mitigating data quality issues, substantially improving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Anomaly Detection Techniques and Applications · Mobile Crowdsensing and Crowdsourcing
