A Recorded Debating Dataset
Shachar Mirkin, Michal Jacovi, Tamar Lavee, Hong-Kwang Kuo, Samuel, Thomas, Leslie Sager, Lili Kotlerman, Elad Venezian, Noam Slonim

TL;DR
This paper introduces a comprehensive English debating speech dataset, including audio, automatic, and manual transcriptions, to support research in computational argumentation and debating technologies.
Contribution
It provides a new, multi-stage debating speech dataset with both automatic and manual transcriptions, facilitating diverse research applications.
Findings
Dataset includes 60 debates on controversial topics.
Multiple transcript formats support various NLP tasks.
Resource aims to enhance debate-specific speech and argumentation research.
Abstract
This paper describes an English audio and textual dataset of debating speeches, a unique resource for the growing research field of computational argumentation and debating technologies. We detail the process of speech recording by professional debaters, the transcription of the speeches with an Automatic Speech Recognition (ASR) system, their consequent automatic processing to produce a text that is more "NLP-friendly", and in parallel -- the manual transcription of the speeches in order to produce gold-standard "reference" transcripts. We release 60 speeches on various controversial topics, each in five formats corresponding to the different stages in the production of the data. The intention is to allow utilizing this resource for multiple research purposes, be it the addition of in-domain training data for a debate-specific ASR system, or applying argumentation mining on either…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Agent Systems and Negotiation · Natural Language Processing Techniques · Topic Modeling
