Claim2Vec: Embedding Fact-Check Claims for Multilingual Similarity and Clustering

Rrubaa Panchendrarajan; Arkaitz Zubiaga

arXiv:2604.09812·cs.CL·April 16, 2026

Claim2Vec: Embedding Fact-Check Claims for Multilingual Similarity and Clustering

Rrubaa Panchendrarajan, Arkaitz Zubiaga

PDF

1 Models

TL;DR

Claim2Vec is a novel multilingual embedding model that improves claim clustering for fact-checking by fine-tuning with contrastive learning, enhancing cross-lingual claim representation and clustering performance.

Contribution

It introduces Claim2Vec, the first multilingual claim embedding model optimized for clustering, with significant improvements demonstrated across multiple datasets and clustering algorithms.

Findings

01

Claim2Vec improves claim clustering performance.

02

Fine-tuning enhances cross-lingual claim representation.

03

Clusters with multiple languages benefit from the model.

Abstract

Recurrent claims present a major challenge for automated fact-checking systems designed to combat misinformation, especially in multilingual settings. While tasks such as claim matching and fact-checked claim retrieval aim to address this problem by linking claim pairs, the broader challenge of effectively representing groups of similar claims that can be resolved with the same fact-check via claim clustering remains relatively underexplored. To address this gap, we introduce Claim2Vec, the first multilingual embedding model optimized to represent fact-check claims as vectors in an improved semantic embedding space. We fine-tune a multilingual encoder using contrastive learning with similar multilingual claim pairs. Experiments on the claim clustering task using three datasets, 14 multilingual embedding models, and 7 clustering algorithms demonstrate that Claim2Vec significantly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
Rrubaa/claim2vec
model· 3 dl
3 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.