Deep Retrieval at CheckThat! 2025: Identifying Scientific Papers from Implicit Social Media Mentions via Hybrid Retrieval and Re-Ranking
Pascal J. Sager, Ashwini Kamaraj, Benjamin F. Grewe, Thilo Stadelmann

TL;DR
This paper introduces a hybrid retrieval system combining lexical, semantic, and re-ranking techniques to effectively identify scientific papers from informal social media mentions, achieving top performance in a competitive benchmark.
Contribution
The authors develop a novel hybrid retrieval pipeline that integrates BM25, dense semantic search, and LLM-based re-ranking, demonstrating high effectiveness without external training data.
Findings
Achieved 76.46% MRR@5 on development set
Ranked 3rd out of 31 teams on the test leaderboard
Effective retrieval with open-source models without external data
Abstract
We present the methodology and results of the Deep Retrieval team for subtask 4b of the CLEF CheckThat! 2025 competition, which focuses on retrieving relevant scientific literature for given social media posts. To address this task, we propose a hybrid retrieval pipeline that combines lexical precision, semantic generalization, and deep contextual re-ranking, enabling robust retrieval that bridges the informal-to-formal language gap. Specifically, we combine BM25-based keyword matching with a FAISS vector store using a fine-tuned INF-Retriever-v1 model for dense semantic retrieval. BM25 returns the top 30 candidates, and semantic search yields 100 candidates, which are then merged and re-ranked via a large language model (LLM)-based cross-encoder. Our approach achieves a mean reciprocal rank at 5 (MRR@5) of 76.46% on the development set and 66.43% on the hidden test set, securing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Expert finding and Q&A systems
MethodsSparse Evolutionary Training
