Self-Supervised Learning for Multi-Channel Neural Transducer

Atsushi Kojima

arXiv:2408.02945·cs.CL·August 7, 2024

Self-Supervised Learning for Multi-Channel Neural Transducer

Atsushi Kojima

PDF

Open Access

TL;DR

This paper introduces a self-supervised learning approach for multi-channel end-to-end speech recognition using wav2vec 2.0, demonstrating significant error rate reductions through feature-wise quantization on real-world datasets.

Contribution

It extends wav2vec 2.0 self-supervised learning to multi-channel neural transducer models with novel feature quantization methods, especially feature-wise quantization.

Findings

01

Feature-wise quantization outperforms other methods.

02

Achieved 66% relative reduction in character error rate.

03

Effective on both in-house and CHiME-4 datasets.

Abstract

Self-supervised learning, such as with the wav2vec 2.0 framework significantly improves the accuracy of end-to-end automatic speech recognition (ASR). Wav2vec 2.0 has been applied to single-channel end-to-end ASR models. In this work, we explored a self-supervised learning method for a multi-channel end-to-end ASR model based on the wav2vec 2.0 framework. As the multi-channel end-to-end ASR model, we focused on a multi-channel neural transducer. In pre-training, we compared three different methods for feature quantization to train a multi-channel conformer audio encoder: joint quantization, feature-wise quantization and channel-wise quantization. In fine-tuning, we trained the multi-channel conformer-transducer. All experiments were conducted using the far-field in-house and CHiME-4 datasets. The results of the experiments showed that feature-wise quantization was the most effective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSensor Technology and Measurement Systems · Neural Networks and Applications