Team IELAB at TREC Clinical Trial Track 2023: Enhancing Clinical Trial Retrieval with Neural Rankers and Large Language Models
Shengyao Zhuang, Bevan Koopman, Guido Zuccon

TL;DR
This paper presents a novel clinical trial retrieval system that combines neural rankers with Large Language Models like GPT-4 to generate training data, improve ranking accuracy, and incorporate human-like judgments, advancing retrieval performance.
Contribution
The paper introduces a new approach that leverages GPT-4 for data augmentation and re-ranking, enhancing neural retrieval models for clinical trial search tasks.
Findings
Effective use of GPT-4 for generating training data
Improved retrieval performance with combined neural rankers and LLMs
Successful integration of human and AI judgments for re-ranking
Abstract
We describe team ielab from CSIRO and The University of Queensland's approach to the 2023 TREC Clinical Trials Track. Our approach was to use neural rankers but to utilise Large Language Models to overcome the issue of lack of training data for such rankers. Specifically, we employ ChatGPT to generate relevant patient descriptions for randomly selected clinical trials from the corpus. This synthetic dataset, combined with human-annotated training data from previous years, is used to train both dense and sparse retrievers based on PubmedBERT. Additionally, a cross-encoder re-ranker is integrated into the system. To further enhance the effectiveness of our approach, we prompting GPT-4 as a TREC annotator to provide judgments on our run files. These judgments are subsequently employed to re-rank the results. This architecture tightly integrates strong PubmedBERT-based rankers with the aid…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Machine Learning in Healthcare · Topic Modeling
MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Softmax · Label Smoothing · Multi-Head Attention · Adam · Dropout · Absolute Position Encodings · Layer Normalization
