TL;DR
Claim2Vec is a novel multilingual embedding model that improves claim clustering for fact-checking by fine-tuning with contrastive learning, enhancing cross-lingual claim representation and clustering performance.
Contribution
It introduces Claim2Vec, the first multilingual claim embedding model optimized for clustering, with significant improvements demonstrated across multiple datasets and clustering algorithms.
Findings
Claim2Vec improves claim clustering performance.
Fine-tuning enhances cross-lingual claim representation.
Clusters with multiple languages benefit from the model.
Abstract
Recurrent claims present a major challenge for automated fact-checking systems designed to combat misinformation, especially in multilingual settings. While tasks such as claim matching and fact-checked claim retrieval aim to address this problem by linking claim pairs, the broader challenge of effectively representing groups of similar claims that can be resolved with the same fact-check via claim clustering remains relatively underexplored. To address this gap, we introduce Claim2Vec, the first multilingual embedding model optimized to represent fact-check claims as vectors in an improved semantic embedding space. We fine-tune a multilingual encoder using contrastive learning with similar multilingual claim pairs. Experiments on the claim clustering task using three datasets, 14 multilingual embedding models, and 7 clustering algorithms demonstrate that Claim2Vec significantly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
