Annotation Tool and Dataset for Fact-Checking Podcasts

Vinay Setty; Adam James Becker

arXiv:2502.01402·cs.CL·February 4, 2025

Annotation Tool and Dataset for Fact-Checking Podcasts

Vinay Setty, Adam James Becker

PDF

Open Access

TL;DR

This paper introduces a real-time podcast annotation tool that combines transcription, crowdsourcing, and machine learning to facilitate fact-checking of multilingual spoken content, and releases a high-quality dataset for further research.

Contribution

It presents a novel real-time annotation system for podcasts, integrating transcription, crowdsourcing, and transformer models, and releases a new dataset for fact-checking tasks.

Findings

01

High-quality annotated podcast dataset created

02

Effective fine-tuning of multilingual models demonstrated

03

Preliminary experiments show promising fact-checking performance

Abstract

Podcasts are a popular medium on the web, featuring diverse and multilingual content that often includes unverified claims. Fact-checking podcasts is a challenging task, requiring transcription, annotation, and claim verification, all while preserving the contextual details of spoken content. Our tool offers a novel approach to tackle these challenges by enabling real-time annotation of podcasts during playback. This unique capability allows users to listen to the podcast and annotate key elements, such as check-worthy claims, claim spans, and contextual errors, simultaneously. By integrating advanced transcription models like OpenAI's Whisper and leveraging crowdsourced annotations, we create high-quality datasets to fine-tune multilingual transformer models such as XLM-RoBERTa for tasks like claim detection and stance classification. Furthermore, we release the annotated podcast…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRadio, Podcasts, and Digital Media · Web Data Mining and Analysis · Peer-to-Peer Network Technologies