Identifying Speakers in Dialogue Transcripts: A Text-based Approach Using Pretrained Language Models
Minh Nguyen, Franck Dernoncourt, Seunghyun Yoon, Hanieh Deilamsalehy,, Hao Tan, Ryan Rossi, Quan Hung Tran, Trung Bui, Thien Huu Nguyen

TL;DR
This paper presents a new large-scale dataset and transformer-based models for text-based speaker identification in dialogue transcripts, significantly improving accuracy and setting new benchmarks.
Contribution
It introduces a novel large-scale dataset from MediaSum and develops transformer models tailored for speaker identification, addressing gaps in existing research.
Findings
Achieved 80.3% precision in speaker identification
Developed models leveraging contextual dialogue cues
Provided publicly available dataset and code
Abstract
We introduce an approach to identifying speaker names in dialogue transcripts, a crucial task for enhancing content accessibility and searchability in digital media archives. Despite the advancements in speech recognition, the task of text-based speaker identification (SpeakerID) has received limited attention, lacking large-scale, diverse datasets for effective model training. Addressing these gaps, we present a novel, large-scale dataset derived from the MediaSum corpus, encompassing transcripts from a wide range of media sources. We propose novel transformer-based models tailored for SpeakerID, leveraging contextual cues within dialogues to accurately attribute speaker names. Through extensive experiments, our best model achieves a great precision of 80.3\%, setting a new benchmark for SpeakerID. The data and code are publicly available here:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Natural Language Processing Techniques
