A Distributed Real-Time Recommender System for Big Data Streams
Heidy Hazem, Ahmed Awad, Ahmed Hassan

TL;DR
This paper introduces a distributed streaming recommender system architecture that improves scalability, latency, and accuracy for big data streams by leveraging a splitting and replication mechanism inspired by shared-nothing architecture, implemented on Apache Flink.
Contribution
It proposes a novel distributed architecture for streaming recommender systems that addresses scalability, concept drift, and real-time processing, extending existing methods to handle big data volumes.
Findings
40% improvement in online recall
Over 50% reduction in memory consumption
Enhanced processing latency and throughput
Abstract
In today's data-driven world, recommender systems (RS) play a crucial role to support the decision-making process. As users become continuously connected to the internet, they become less patient and less tolerant to obsolete recommendations made by an RS, e.g., movie recommendations on Netflix or books to read on Amazon. This, in turn, requires continuous training of the RS to cope with both the online fashion of data and the changing nature of user tastes and interests, known as concept drift. Streaming (online) RS has to address three requirements: continuous training and recommendation, handling concept drifts, and ability to scale. Streaming recommender systems proposed in the literature mostly, address the first two requirements and do not consider scalability. That is because they run the training process on a single machine. Such a machine, no matter how powerful it is, will…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Caching and Content Delivery · Advanced Bandit Algorithms Research
