TripClick: The Log Files of a Large Health Web Search Engine

Navid Rekabsaz; Oleg Lesota; Markus Schedl; Jon Brassey; Carsten; Eickhoff

arXiv:2103.07901·cs.IR·April 29, 2021

TripClick: The Log Files of a Large Health Web Search Engine

Navid Rekabsaz, Oleg Lesota, Markus Schedl, Jon Brassey, Carsten, Eickhoff

PDF

1 Repo

TL;DR

This paper introduces TripClick, a large-scale, health domain-specific click log dataset from the Trip Database, enabling improved neural IR models and benchmarks for health-related search tasks.

Contribution

The paper releases the first large-scale, health-specific click log dataset and establishes a new IR evaluation benchmark, facilitating neural IR model development in healthcare.

Findings

01

Neural IR models outperform classical models on the benchmark.

02

The dataset enables training of neural models with many parameters.

03

Performance gains are especially notable for frequent queries.

Abstract

Click logs are valuable resources for a variety of information retrieval (IR) tasks. This includes query understanding/analysis, as well as learning effective IR models particularly when the models require large amounts of training data. We release a large-scale domain-specific dataset of click logs, obtained from user interactions of the Trip Database health web search engine. Our click log dataset comprises approximately 5.2 million user interactions collected between 2013 and 2020. We use this dataset to create a standard IR evaluation benchmark -- TripClick -- with around 700,000 unique free-text queries and 1.3 million pairs of query-document relevance signals, whose relevance is estimated by two click-through models. As such, the collection is one of the few datasets offering the necessary data richness and scale to train neural IR models with a large amount of parameters, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tripdatabase/tripclick
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.