SuperDialseg: A Large-scale Dataset for Supervised Dialogue Segmentation
Junfeng Jiang, Chengzhang Dong, Sadao Kurohashi, Akiko Aizawa

TL;DR
This paper introduces SuperDialseg, a large-scale supervised dataset for dialogue segmentation, along with benchmarks and evaluations demonstrating the effectiveness of supervised learning in this task.
Contribution
The paper provides a new large-scale supervised dataset for dialogue segmentation and a comprehensive benchmark with multiple models and evaluation metrics.
Findings
Supervised models outperform unsupervised methods in dialogue segmentation.
Models trained on SuperDialseg generalize well to out-of-domain data.
High-quality dataset verified through human annotation and Kappa score.
Abstract
Dialogue segmentation is a crucial task for dialogue systems allowing a better understanding of conversational texts. Despite recent progress in unsupervised dialogue segmentation methods, their performances are limited by the lack of explicit supervised signals for training. Furthermore, the precise definition of segmentation points in conversations still remains as a challenging problem, increasing the difficulty of collecting manual annotations. In this paper, we provide a feasible definition of dialogue segmentation points with the help of document-grounded dialogues and release a large-scale supervised dataset called SuperDialseg, containing 9,478 dialogues based on two prevalent document-grounded dialogue corpora, and also inherit their useful dialogue-related annotations. Moreover, we provide a benchmark including 18 models across five categories for the dialogue segmentation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques
