TL;DR
This paper introduces a zero-shot multilingual claim retrieval system using large language models, achieving top rankings in SemEval-2025 Task 7 by leveraging embeddings and model combinations.
Contribution
The work demonstrates the effectiveness of combining large language models for multilingual claim retrieval without task-specific training.
Findings
Achieved 7th place monolingually and 9th cross-lingually in the competition.
Using English translations with multilingual models was ineffective.
Model combinations improved results for some languages.
Abstract
This paper presents a zero-shot system for fact-checked claim retrieval. We employed several state-of-the-art large language models to obtain text embeddings. The models were then combined to obtain the best possible result. Our approach achieved 7th place in monolingual and 9th in cross-lingual subtasks. We used only English translations as an input to the text embedding models since multilingual models did not achieve satisfactory results. We identified the most relevant claims for each post by leveraging the embeddings and measuring cosine similarity. Overall, the best results were obtained by the NVIDIA NV-Embed-v2 model. For some languages, we benefited from model combinations (NV-Embed & GPT or Mistral).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
