A Large-Scale Web Search Dataset for Federated Online Learning to Rank
Marcel Gregoriadis, Jingwei Kang, Johan Pouwelse

TL;DR
This paper introduces AOL4FOLTR, a large-scale, realistic web search dataset designed to improve federated learning to rank by capturing authentic user behavior, queries, and click data, addressing limitations of previous benchmarks.
Contribution
We present AOL4FOLTR, a comprehensive dataset with real user data, enabling more realistic federated learning to rank experiments and benchmarks.
Findings
Provides 2.6 million queries from 10,000 users with real click data.
Enables realistic user partitioning and asynchronous federated learning scenarios.
Addresses limitations of previous simplified benchmarks.
Abstract
The centralized collection of search interaction logs for training ranking models raises significant privacy concerns. Federated Online Learning to Rank (FOLTR) offers a privacy-preserving alternative by enabling collaborative model training without sharing raw user data. However, benchmarks in FOLTR are largely based on random partitioning of classical learning-to-rank datasets, simulated user clicks, and the assumption of synchronous client participation. This oversimplifies real-world dynamics and undermines the realism of experimental results. We present AOL4FOLTR, a large-scale web search dataset with 2.6 million queries from 10,000 users. Our dataset addresses key limitations of existing benchmarks by including user identifiers, real click data, and query timestamps, enabling realistic user partitioning, behavior modeling, and asynchronous federated learning scenarios.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
