TL;DR
This paper introduces a word-level speech encoder trained for cross-task transfer learning, demonstrating its effectiveness across diverse speech processing tasks and outperforming or matching task-specific methods.
Contribution
The paper presents a novel pre-trained encoder for word-level speech representations that enables effective cross-task transfer learning in speech processing.
Findings
Pre-trained encoder improves performance across multiple speech tasks.
Simple application of the encoder often outperforms task-specific methods.
Representation transferability is validated across different datasets.
Abstract
Recent breakthroughs in deep learning often rely on representation learning and knowledge transfer. In recent years, unsupervised and self-supervised techniques for learning speech representation were developed to foster automatic speech recognition. Up to date, most of these approaches are task-specific and designed for within-task transfer learning between different datasets or setups of a particular task. In turn, learning task-independent representation of speech and cross-task applications of transfer learning remain less common. Here, we introduce an encoder capturing word-level representations of speech for cross-task transfer learning. We demonstrate the application of the pre-trained encoder in four distinct speech and audio processing tasks: (i) speech enhancement, (ii) language identification, (iii) speech, noise, and music classification, and (iv) speaker identification. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
