DEBISS: a Corpus of Individual, Semi-structured and Spoken Debates

Klaywert Danillo Ferreira de Souza; David Eduardo Pereira; Cl\'audio E. C. Campelo; and Larissa Lucena Vasconcelos

arXiv:2603.05459·cs.CL·May 1, 2026

DEBISS: a Corpus of Individual, Semi-structured and Spoken Debates

Klaywert Danillo Ferreira de Souza, David Eduardo Pereira, Cl\'audio E. C. Campelo, and Larissa Lucena Vasconcelos

PDF

TL;DR

The paper introduces DEBISS, a new corpus of spoken, semi-structured debates with diverse NLP annotations, addressing the scarcity of debate datasets for various applications.

Contribution

It presents a comprehensive debate corpus with multi-faceted annotations, facilitating research in speech processing, argument mining, and speaker analysis.

Findings

01

DEBISS includes speech-to-text and speaker diarization annotations.

02

The corpus supports argument mining and debater quality assessment.

03

It addresses the lack of diverse debate corpora in NLP research.

Abstract

The process of debating is essential in our daily lives, whether in studying, work activities, simple everyday discussions, political debates on TV, or online discussions on social networks. The range of uses for debates is broad. Due to the diverse applications, structures, and formats of debates, developing corpora that account for these variations can be challenging, and the scarcity of debate corpora in the state of the art is notable. For this reason, the current research proposes the DEBISS corpus: a collection of spoken and individual debates with semi-structured features. With a broad range of NLP task annotations, such as speech-to-text, speaker diarization, argument mining, and debater quality assessment.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.