The 2021 RecSys Challenge Dataset: Fairness is not optional
Luca Belli, Alykhan Tejani, Frank Portman, Alexandre Lung-Yut-Fong,, Ben Chamberlain, Yuanpu Xie, Kristian Lum, Jonathan Hunt, Michael Bronstein,, Vito Walter Anelli, Saikishore Kalloori, Bruce Ferwerda, Wenzhe Shi

TL;DR
This paper introduces a large, Twitter-synced dataset for the 2021 RecSys Challenge, emphasizing fairness considerations and dynamic data updates to better reflect real-world recommender system challenges.
Contribution
It presents a significantly larger, fairness-aware dataset that is synchronized with Twitter platform changes, addressing challenges of real-time data updates in recommender systems.
Findings
Dataset size increased fivefold to ~1 billion data points.
Incorporation of fairness considerations into dataset design.
Dynamic synchronization with Twitter platform updates.
Abstract
After the success the RecSys 2020 Challenge, we are describing a novel and bigger dataset that was released in conjunction with the ACM RecSys Challenge 2021. This year's dataset is not only bigger (~ 1B data points, a 5 fold increase), but for the first time it take into consideration fairness aspects of the challenge. Unlike many static datsets, a lot of effort went into making sure that the dataset was synced with the Twitter platform: if a user deleted their content, the same content would be promptly removed from the dataset too. In this paper, we introduce the dataset and challenge, highlighting some of the issues that arise when creating recommender systems at Twitter scale.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Advanced Graph Neural Networks · Mobile Crowdsensing and Crowdsourcing
