Reproducing Personalised Session Search over the AOL Query Log
Sean MacAvaney, Craig Macdonald, Iadh Ounis

TL;DR
This study compares different versions of the AOL query log corpus, showing that using a temporally aligned corpus improves session search performance and better reflects the original data.
Contribution
The paper introduces a new corpus version aligned with the original AOL log timeframe, demonstrating its impact on session search experiments and performance.
Findings
Higher document coverage (93%) with the new corpus compared to 55% for the 2017 version.
Performance improvements in session search when using the temporally aligned corpus.
Including URLs enhances model performance, confirming the navigational nature of queries.
Abstract
Despite its troubled past, the AOL Query Log continues to be an important resource to the research community -- particularly for tasks like search personalisation. When using the query log these ranking experiments, little attention is usually paid to the document corpus. Recent work typically uses a corpus containing versions of the documents collected long after the log was produced. Given that web documents are prone to change over time, we study the differences present between a version of the corpus containing documents as they appeared in 2017 (which has been used by several recent works) and a new version we construct that includes documents close to as they appeared at the time the query log was produced (2006). We demonstrate that this new version of the corpus has a far higher coverage of documents present in the original log (93%) than the 2017 version (55%). Among the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
