Evaluating Hybrid Retrieval Augmented Generation using Dynamic Test Sets: LiveRAG Challenge
Chase Fensore, Kaustubh Dhole, Joyce C Ho, Eugene Agichtein

TL;DR
This paper evaluates a hybrid retrieval-augmented generation system on dynamic test sets, combining sparse and dense retrieval methods, and analyzes the impact of re-ranking, prompting strategies, and vocabulary alignment on performance.
Contribution
It introduces a hybrid retrieval approach for RAG systems, assesses re-ranking and prompting strategies, and identifies vocabulary alignment as a key performance predictor.
Findings
Neural re-ranking significantly improves MAP but is computationally expensive.
DSPy prompting increases semantic similarity but has over-confidence issues.
Vocabulary alignment correlates strongly with system performance.
Abstract
We present our submission to the LiveRAG Challenge 2025, which evaluates retrieval-augmented generation (RAG) systems on dynamic test sets using the FineWeb-10BT corpus. Our final hybrid approach combines sparse (BM25) and dense (E5) retrieval methods and then aims to generate relevant and faithful answers with Falcon3-10B-Instruct. Through systematic evaluation on 200 synthetic questions generated with DataMorgana across 64 unique question-user combinations, we demonstrate that neural re-ranking with RankLLaMA improves MAP from 0.523 to 0.797 (52% relative improvement) but introduces prohibitive computational costs (84s vs 1.74s per question). While DSPy-optimized prompting strategies achieved higher semantic similarity (0.771 vs 0.668), their 0% refusal rates raised concerns about over-confidence and generalizability. Our submitted hybrid system without re-ranking achieved 4th place…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Expert finding and Q&A systems · Multimodal Machine Learning Applications
