Rethinking Hybrid Retrieval: When Small Embeddings and LLM Re-ranking Beat Bigger Models
Arjun Rao, Hanieh Alipour, Nick Pendar

TL;DR
This study shows that small, well-aligned embedding models like MiniLM-v6 can outperform larger models in hybrid retrieval systems when combined with LLM re-ranking, leading to better accuracy and efficiency.
Contribution
The paper demonstrates that compact embedding models can outperform larger ones in hybrid retrieval when integrated with LLM re-ranking, challenging the assumption that bigger models are always better.
Findings
MiniLM-v6 outperforms BGE-Large in hybrid retrieval tasks.
Embedding-LMM alignment improves retrieval quality.
Smaller models reduce computational costs while maintaining accuracy.
Abstract
This paper presents a comparison of embedding models in tri-modal hybrid retrieval for Retrieval-Augmented Generation (RAG) systems. We investigate the fusion of dense semantic, sparse lexical, and graph-based embeddings, focusing on the performance of the MiniLM-v6 and BGE-Large architectures. Contrary to conventional assumptions, our results show that the compact MiniLM-v6 outperforms the larger BGE-Large when integrated with LLM-based re-ranking within our tri-modal hybrid framework. Experiments conducted on the SciFact, FIQA, and NFCorpus datasets demonstrate significant improvements in retrieval quality with the MiniLM-v6 configuration. The performance difference is particularly pronounced in agentic re-ranking scenarios, indicating better alignment between MiniLM-v6's embedding space and LLM reasoning. Our findings suggest that embedding model selection for RAG systems should…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management
