UWBa at SemEval-2025 Task 7: Multilingual and Crosslingual Fact-Checked Claim Retrieval

Ladislav Lenc; Daniel C\'ifka; Ji\v{r}\'i Mart\'inek; Jakub \v{S}m\'id; Pavel Kr\'al

arXiv:2508.09517·cs.CL·August 14, 2025

UWBa at SemEval-2025 Task 7: Multilingual and Crosslingual Fact-Checked Claim Retrieval

Ladislav Lenc, Daniel C\'ifka, Ji\v{r}\'i Mart\'inek, Jakub \v{S}m\'id, Pavel Kr\'al

PDF

1 Video

TL;DR

This paper introduces a zero-shot multilingual claim retrieval system using large language models, achieving top rankings in SemEval-2025 Task 7 by leveraging embeddings and model combinations.

Contribution

The work demonstrates the effectiveness of combining large language models for multilingual claim retrieval without task-specific training.

Findings

01

Achieved 7th place monolingually and 9th cross-lingually in the competition.

02

Using English translations with multilingual models was ineffective.

03

Model combinations improved results for some languages.

Abstract

This paper presents a zero-shot system for fact-checked claim retrieval. We employed several state-of-the-art large language models to obtain text embeddings. The models were then combined to obtain the best possible result. Our approach achieved 7th place in monolingual and 9th in cross-lingual subtasks. We used only English translations as an input to the text embedding models since multilingual models did not achieve satisfactory results. We identified the most relevant claims for each post by leveraging the embeddings and measuring cosine similarity. Overall, the best results were obtained by the NVIDIA NV-Embed-v2 model. For some languages, we benefited from model combinations (NV-Embed & GPT or Mistral).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

UWBa at SemEval-2025 Task 7: Multilingual and Crosslingual Fact-Checked Claim Retrieval· underline