Hierarchical Summarization for Longform Spoken Dialog
Daniel Li, Thomas Chen, Albert Tung, Lydia Chilton

TL;DR
This paper introduces a hierarchical summarization system for longform spoken dialog that improves user navigation and understanding by combining ASR and text summarization with semantic segmentation, addressing speech-specific challenges.
Contribution
The paper presents a novel two-stage ASR and summarization pipeline with semantic segmentation algorithms tailored for spoken dialog, enhancing navigation and error recovery.
Findings
Users prefer hierarchical summaries for quick skimming.
The system effectively resolves speech modeling challenges.
Hierarchical summarization improves content navigation.
Abstract
Every day we are surrounded by spoken dialog. This medium delivers rich diverse streams of information auditorily; however, systematically understanding dialog can often be non-trivial. Despite the pervasiveness of spoken dialog, automated speech understanding and quality information extraction remains markedly poor, especially when compared to written prose. Furthermore, compared to understanding text, auditory communication poses many additional challenges such as speaker disfluencies, informal prose styles, and lack of structure. These concerns all demonstrate the need for a distinctly speech tailored interactive system to help users understand and navigate the spoken language domain. While individual automatic speech recognition (ASR) and text summarization methods already exist, they are imperfect technologies; neither consider user purpose and intent nor address spoken language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
