TL;DR
This paper presents a multilingual and crosslingual fact-checked claim retrieval system using a lightweight bi-encoder transformer model, achieving high success rates in SemEval-2025 Task 7.
Contribution
It introduces a multilingual and crosslingual retrieval approach based on a fine-tuned bi-encoder transformer optimized for sentence similarity, with efficient training on Kaggle T4 GPUs.
Findings
92% Success@10 in multilingual retrieval
80% Success@10 in crosslingual retrieval
Ranked 5th in crosslingual and 10th in multilingual tracks
Abstract
SemEval-2025 Task 7: Multilingual and Crosslingual Fact-Checked Claim Retrieval is approached as a Learning-to-Rank task using a bi-encoder model fine-tuned from a pre-trained transformer optimized for sentence similarity. Training used both the source languages and their English translations for multilingual retrieval and only English translations for cross-lingual retrieval. Using lightweight models with fewer than 500M parameters and training on Kaggle T4 GPUs, the method achieved 92% Success@10 in multilingual and 80% Success@10 in 5th in crosslingual and 10th in multilingual tracks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
