Self-supervised Rewiring of Pre-trained Speech Encoders: Towards Faster   Fine-tuning with Less Labels in Speech Processing

Hao Yang; Jinming Zhao; Gholamreza Haffari; Ehsan Shareghi

arXiv:2210.13030·cs.CL·October 25, 2022

Self-supervised Rewiring of Pre-trained Speech Encoders: Towards Faster Fine-tuning with Less Labels in Speech Processing

Hao Yang, Jinming Zhao, Gholamreza Haffari, Ehsan Shareghi

PDF

Open Access 1 Repo

TL;DR

This paper introduces a self-supervised rewiring method for pre-trained speech encoders that improves their representation space, leading to faster and more effective fine-tuning, especially with limited labeled data.

Contribution

The authors propose a label-free, contrastive self-supervised rewiring technique that enhances speech encoder representations and accelerates downstream task fine-tuning.

Findings

01

Improved isotropy in the representation space of wav2vec 2.

02

Significant speedup in fine-tuning convergence across 6 speech tasks.

03

Consistent performance gains in low-resource scenarios.

Abstract

Pre-trained speech Transformers have facilitated great success across various speech processing tasks. However, fine-tuning these encoders for downstream tasks require sufficiently large training data to converge or to achieve state-of-the-art. In text domain this has been partly attributed to sub-optimality of the representation space in pre-trained Transformers. In this work, we take a sober look into pre-trained speech encoders and rewire their representation space without requiring any task-specific labels. Our method utilises neutrally synthesised version of audio inputs along with frame masking to construct positive pairs for contrastive self-supervised learning. When used for augmenting the wav2vec 2 encoder, we observe consistent improvement of isotropy in the representation space. Our experiments on 6 speech processing tasks, exhibit a significant convergence speedup during…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yanghao97/rewirew2v2
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing