Gold Standard Online Debates Summaries and First Experiments Towards Automatic Summarization of Online Debate Data
Nattapong Sanchan, Ahmet Aker, Kalina Bontcheva

TL;DR
This paper introduces a new dataset of annotated online debate summaries, evaluates inter-annotator agreement, and presents an initial extractive summarization system tailored for online debate data.
Contribution
It provides the first annotated debate dataset for summarization and explores features for automatic summarization of online debates.
Findings
Inter-annotator agreement is 36% (Cohen's kappa) and 48% (Krippendorff's alpha).
A baseline extractive summarization system for online debates is implemented.
Discussion of key features for effective debate summarization.
Abstract
Usage of online textual media is steadily increasing. Daily, more and more news stories, blog posts and scientific articles are added to the online volumes. These are all freely accessible and have been employed extensively in multiple research areas, e.g. automatic text summarization, information retrieval, information extraction, etc. Meanwhile, online debate forums have recently become popular, but have remained largely unexplored. For this reason, there are no sufficient resources of annotated debate data available for conducting research in this genre. In this paper, we collected and annotated debate data for an automatic summarization task. Similar to extractive gold standard summary generation our data contains sentences worthy to include into a summary. Five human annotators performed this task. Inter-annotator agreement, based on semantic similarity, is 36% for Cohen's kappa…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
