DS@GT at CheckThat! 2025: Exploring Retrieval and Reranking Pipelines for Scientific Claim Source Retrieval on Social Media Discourse
Jeanette Schofield, Shuyu Tian, Hoang Thanh Thanh Truong, Maximilian Heil

TL;DR
This paper presents a retrieval and reranking pipeline for scientific claim source identification on social media, utilizing data augmentation and fine-tuning techniques, achieving notable improvements over baseline methods.
Contribution
The team explored multiple data augmentation and retrieval strategies, fine-tuned a bi-encoder, and demonstrated improved performance in scientific claim source retrieval from social media.
Findings
Achieved an MRR@5 of 0.58, ranking 16th out of 30 teams.
Improved retrieval performance by 0.15 over the BM25 baseline.
Developed and shared code for reproducibility.
Abstract
Social media users often make scientific claims without citing where these claims come from, generating a need to verify these claims. This paper details work done by the DS@GT team for CLEF 2025 CheckThat! Lab Task 4b Scientific Claim Source Retrieval which seeks to find relevant scientific papers based on implicit references in tweets. Our team explored 6 different data augmentation techniques, 7 different retrieval and reranking pipelines, and finetuned a bi-encoder. Achieving an MRR@5 of 0.58, our team ranked 16th out of 30 teams for the CLEF 2025 CheckThat! Lab Task 4b, and improvement of 0.15 over the BM25 baseline of 0.43. Our code is available on Github at https://github.com/dsgt-arc/checkthat-2025-swd/tree/main/subtask-4b.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExpert finding and Q&A systems · Topic Modeling · Misinformation and Its Impacts
