TL;DR
This study investigates how automatic speech recognition (ASR) errors affect speaker attribution, revealing that speaker identification remains robust even with errorful transcripts, and that ASR errors may sometimes enhance attribution accuracy.
Contribution
First comprehensive analysis of the impact of ASR transcription errors on speaker attribution performance, highlighting its resilience and potential advantages over human transcriptions.
Findings
Speaker attribution is surprisingly resilient to transcription errors.
ASR errors can sometimes improve speaker attribution accuracy.
Attribution performance is minimally affected by the goal of transcript recovery.
Abstract
Speaker attribution from speech transcripts is the task of identifying a speaker from the transcript of their speech based on patterns in their language use. This task is especially useful when the audio is unavailable (e.g. deleted) or unreliable (e.g. anonymized speech). Prior work in this area has primarily focused on the feasibility of attributing speakers using transcripts produced by human annotators. However, in real-world settings, one often only has more errorful transcripts produced by automatic speech recognition (ASR) systems. In this paper, we conduct what is, to our knowledge, the first comprehensive study of the impact of automatic transcription on speaker attribution performance. In particular, we study the extent to which speaker attribution performance degrades in the face of transcription errors, as well as how properties of the ASR system impact attribution. We find…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
