Establishing Strong Baselines for TripClick Health Retrieval

Sebastian Hofst\"atter; Sophia Althammer; Mete Sertkan; Allan Hanbury

arXiv:2201.00365·cs.IR·January 4, 2022

Establishing Strong Baselines for TripClick Health Retrieval

Sebastian Hofst\"atter, Sophia Althammer, Mete Sertkan, Allan Hanbury

PDF

Open Access 2 Repos 1 Datasets

TL;DR

This paper introduces strong Transformer-based re-ranking and dense retrieval baselines for TripClick health retrieval, improving data quality and demonstrating significant gains over traditional methods, with insights on domain-specific models.

Contribution

It provides new strong baselines for TripClick health retrieval using Transformer models and dense retrieval, with improved data handling and analysis of domain-specific pre-trained models.

Findings

01

Dense retrieval outperforms BM25 significantly.

02

Improved training data with negative sampling boosts performance.

03

Domain-specific pre-trained models impact retrieval effectiveness.

Abstract

We present strong Transformer-based re-ranking and dense retrieval baselines for the recently released TripClick health ad-hoc retrieval collection. We improve the - originally too noisy - training data with a simple negative sampling policy. We achieve large gains over BM25 in the re-ranking task of TripClick, which were not achieved with the original baselines. Furthermore, we study the impact of different domain-specific pre-trained models on TripClick. Finally, we show that dense retrieval outperforms BM25 by considerable margins, even with simple training procedures.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

sebastian-hofstaetter/tripclick-training
dataset· 18 dl
18 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling