TL;DR
This paper introduces Non-Autoregressive Predictive Coding (NPC), a fast, parallelizable self-supervised speech representation learning method based on local dependencies, achieving comparable performance to existing methods with improved efficiency.
Contribution
The paper proposes NPC, a novel non-autoregressive approach for speech representation learning that relies solely on local dependencies and is easily implementable with Masked Convolution Blocks.
Findings
NPC achieves comparable phonetic and speaker classification accuracy to existing methods.
NPC offers significant inference speedup due to its parallelizable nature.
Theoretical and empirical analyses confirm NPC's effectiveness and efficiency.
Abstract
Self-supervised speech representations have been shown to be effective in a variety of speech applications. However, existing representation learning methods generally rely on the autoregressive model and/or observed global dependencies while generating the representation. In this work, we propose Non-Autoregressive Predictive Coding (NPC), a self-supervised method, to learn a speech representation in a non-autoregressive manner by relying only on local dependencies of speech. NPC has a conceptually simple objective and can be implemented easily with the introduced Masked Convolution Blocks. NPC offers a significant speedup for inference since it is parallelizable in time and has a fixed inference time for each time step regardless of the input sequence length. We discuss and verify the effectiveness of NPC by theoretically and empirically comparing it with other methods. We show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConvolution · Masked Convolution
