What Makes a Top-Performing Precision Medicine Search Engine? Tracing Main System Features in a Systematic Way
Erik Faessler, Michel Oleynik, Udo Hahn

TL;DR
This paper systematically analyzes the impact of various system features on the performance of a precision medicine search engine, using optimization and ablation studies on TREC-PM data.
Contribution
It introduces a systematic approach to evaluate and optimize individual system features affecting search engine performance in precision medicine.
Findings
Optimal feature configurations identified via SMAC
Key features like BM25 parameters and query expansion significantly impact performance
Systematic ablation reveals the contribution of each feature to effectiveness
Abstract
From 2017 to 2019 the Text REtrieval Conference (TREC) held a challenge task on precision medicine using documents from medical publications (PubMed) and clinical trials. Despite lots of performance measurements carried out in these evaluation campaigns, the scientific community is still pretty unsure about the impact individual system features and their weights have on the overall system performance. In order to overcome this explanatory gap, we first determined optimal feature configurations using the Sequential Model-based Algorithm Configuration (SMAC) program and applied its output to a BM25-based search engine. We then ran an ablation study to systematically assess the individual contributions of relevant system features: BM25 parameters, query type and weighting schema, query expansion, stop word filtering, and keyword boosting. For evaluation, we employed the gold standard data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
