ASR-Aware End-to-end Neural Diarization
Aparna Khare, Eunjung Han, Yuguang Yang, Andreas Stolcke

TL;DR
This paper introduces an end-to-end neural diarization model that integrates ASR-derived features and proposes new architectural modifications, achieving significant DER reduction on conversational speech datasets.
Contribution
It presents a novel Conformer-based EEND model incorporating ASR features through concatenation, contextualized self-attention, and multi-task learning, improving diarization accuracy.
Findings
Achieves 20% relative DER reduction on Switchboard+SRE datasets.
Demonstrates effectiveness of multi-task learning with position-in-word features.
Shows that ASR features enhance speaker diarization performance.
Abstract
We present a Conformer-based end-to-end neural diarization (EEND) model that uses both acoustic input and features derived from an automatic speech recognition (ASR) model. Two categories of features are explored: features derived directly from ASR output (phones, position-in-word and word boundaries) and features derived from a lexical speaker change detection model, trained by fine-tuning a pretrained BERT model on the ASR output. Three modifications to the Conformer-based EEND architecture are proposed to incorporate the features. First, ASR features are concatenated with acoustic features. Second, we propose a new attention mechanism called contextualized self-attention that utilizes ASR features to build robust speaker representations. Finally, multi-task learning is used to train the model to minimize classification loss for the ASR features along with diarization loss.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Weight Decay · WordPiece · Dense Connections · Residual Connection · Linear Warmup With Linear Decay · Softmax · Layer Normalization
