Hybrid Retrieval for COVID-19 Literature: Comparing Rank Fusion and Projection Fusion with Diversity Reranking
Harishkumar Kishorkumar Prajapati

TL;DR
This paper introduces a hybrid retrieval system for COVID-19 literature that compares rank fusion and projection fusion methods, demonstrating improved relevance and diversity in search results.
Contribution
It presents a novel projection-based vector fusion approach (B5) and evaluates its performance against traditional rank fusion (RRF) in a large-scale COVID-19 literature retrieval task.
Findings
RRF fusion achieves the highest relevance (nDCG@10 = 0.828).
Projection fusion (B5) is 33% faster and yields higher diversity metrics.
Both methods maintain sub-2 second latency and are deployed as a web app.
Abstract
We present a hybrid retrieval system for COVID-19 scientific literature, evaluated on the TREC-COVID benchmark (171,332 papers, 50 expert queries). The system implements six retrieval configurations spanning sparse (SPLADE), dense (BGE), rank-level fusion (RRF), and a projection-based vector fusion (B5) approach. RRF fusion achieves the best relevance (nDCG@10 = 0.828), outperforming dense-only by 6.1% and sparse-only by 14.9%. Our projection fusion variant reaches nDCG@10 = 0.678 on expert queries while being 33% faster (847 ms vs. 1271 ms) and producing 2.2x higher ILD@10 than RRF. Evaluation across 400 queries -- including expert, machine-generated, and three paraphrase styles -- shows that B5 delivers the largest relative gain on keyword-heavy reformulations (+8.8%), although RRF remains best in absolute nDCG@10. On expert queries, MMR reranking increases intra-list diversity by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
