TUNet: A Block-online Bandwidth Extension Model based on Transformers   and Self-supervised Pretraining

Viet-Anh Nguyen; Anh H. T. Nguyen; and Andy W. H. Khong

arXiv:2110.13492·cs.LG·June 8, 2022

TUNet: A Block-online Bandwidth Extension Model based on Transformers and Self-supervised Pretraining

Viet-Anh Nguyen, Anh H. T. Nguyen, and Andy W. H. Khong

PDF

1 Repo

TL;DR

This paper presents TUNet, a bandwidth extension model using transformers and self-supervised pretraining, achieving improved performance and efficiency in speech signal processing.

Contribution

It introduces a block-online TUNet architecture with a simplified UNet backbone, incorporating transformers and self-supervised pretraining for enhanced bandwidth extension.

Findings

01

Outperforms recent baselines in VCTK dataset evaluations

02

Pretraining and data augmentation improve stability and quality

03

Reduces inference time compared to previous models

Abstract

We introduce a block-online variant of the temporal feature-wise linear modulation (TFiLM) model to achieve bandwidth extension. The proposed architecture simplifies the UNet backbone of the TFiLM to reduce inference time and employs an efficient transformer at the bottleneck to alleviate performance degradation. We also utilize self-supervised pretraining and data augmentation to enhance the quality of bandwidth extended signals and reduce the sensitivity with respect to downsampling methods. Experiment results on the VCTK dataset show that the proposed method outperforms several recent baselines in both intrusive and non-intrusive metrics. Pretraining and filter augmentation also help stabilize and enhance the overall performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nxtproduct/tunet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.