Semi-automated Fact-checking in Portuguese: Corpora Enrichment using Retrieval with Claim extraction
Juliana Resplande Sant'anna Gomes, Arlindo Rodrigues Galv\~ao Filho

TL;DR
This paper presents a semi-automated fact-checking methodology for Portuguese news, enriching corpora with external evidence using LLMs and search APIs, addressing data scarcity and improving verification processes.
Contribution
It introduces a novel approach combining claim extraction, evidence retrieval, and data validation to enhance Portuguese fact-checking datasets with external evidence.
Findings
Enriched Portuguese news corpora with external evidence.
Implemented claim extraction and evidence retrieval pipeline.
Improved data quality through validation and near-duplicate detection.
Abstract
The accelerated dissemination of disinformation often outpaces the capacity for manual fact-checking, highlighting the urgent need for Semi-Automated Fact-Checking (SAFC) systems. Within the Portuguese language context, there is a noted scarcity of publicly available datasets that integrate external evidence, an essential component for developing robust AFC systems, as many existing resources focus solely on classification based on intrinsic text features. This dissertation addresses this gap by developing, applying, and analyzing a methodology to enrich Portuguese news corpora (Fake.Br, COVID19.BR, MuMiN-PT) with external evidence. The approach simulates a user's verification process, employing Large Language Models (LLMs, specifically Gemini 1.5 Flash) to extract the main claim from texts and search engine APIs (Google Search API, Google FactCheck Claims Search API) to retrieve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Big Data and Digital Economy · Benford’s Law and Fraud Detection
