Establishing Strong Baselines for TripClick Health Retrieval
Sebastian Hofst\"atter, Sophia Althammer, Mete Sertkan, Allan Hanbury

TL;DR
This paper introduces strong Transformer-based re-ranking and dense retrieval baselines for TripClick health retrieval, improving data quality and demonstrating significant gains over traditional methods, with insights on domain-specific models.
Contribution
It provides new strong baselines for TripClick health retrieval using Transformer models and dense retrieval, with improved data handling and analysis of domain-specific pre-trained models.
Findings
Dense retrieval outperforms BM25 significantly.
Improved training data with negative sampling boosts performance.
Domain-specific pre-trained models impact retrieval effectiveness.
Abstract
We present strong Transformer-based re-ranking and dense retrieval baselines for the recently released TripClick health ad-hoc retrieval collection. We improve the - originally too noisy - training data with a simple negative sampling policy. We achieve large gains over BM25 in the re-ranking task of TripClick, which were not achieved with the original baselines. Furthermore, we study the impact of different domain-specific pre-trained models on TripClick. Finally, we show that dense retrieval outperforms BM25 by considerable margins, even with simple training procedures.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling
