Dialogue Disentanglement in Software Engineering: How Far are We?
Ziyou Jiang, Lin Shi, Celia Chen, Jun Hu, Qing Wang

TL;DR
This paper evaluates current dialog disentanglement methods in software engineering chats, finds their limitations, introduces a new measure DLD, and analyzes common failure cases to guide future improvements.
Contribution
It provides a comprehensive evaluation of existing approaches, introduces a novel measure DLD, and identifies key issues affecting disentanglement quality in software chats.
Findings
Existing approaches perform poorly on technical dialogs.
Current measures do not reflect human satisfaction accurately.
Four common failure cases in disentanglement are identified.
Abstract
Despite the valuable information contained in software chat messages, disentangling them into distinct conversations is an essential prerequisite for any in-depth analyses that utilize this information. To provide a better understanding of the current state-of-the-art, we evaluate five popular dialog disentanglement approaches on software-related chat. We find that existing approaches do not perform well on disentangling software-related dialogs that discuss technical and complex topics. Further investigation on how well the existing disentanglement measures reflect human satisfaction shows that existing measures cannot correctly indicate human satisfaction on disentanglement results. Therefore, in this paper, we introduce and evaluate a novel measure, named DLD. Using results of human satisfaction, we further summarize four most frequently appeared bad disentanglement cases on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Software Engineering Research · Speech and dialogue systems
