SelfRemaster: Self-Supervised Speech Restoration with   Analysis-by-Synthesis Approach Using Channel Modeling

Takaaki Saeki; Shinnosuke Takamichi; Tomohiko Nakamura; Naoko Tanji,; Hiroshi Saruwatari

arXiv:2203.12937·cs.SD·June 29, 2022

SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling

Takaaki Saeki, Shinnosuke Takamichi, Tomohiko Nakamura, Naoko Tanji,, Hiroshi Saruwatari

PDF

Open Access 1 Repo

TL;DR

SelfRemaster is a self-supervised speech restoration method that effectively restores degraded speech without paired data, using an analysis-by-synthesis approach with channel modeling, and surpasses previous supervised methods in quality.

Contribution

It introduces a novel self-supervised framework with analysis, synthesis, and channel modules that better model real-world acoustic distortions for speech restoration.

Findings

01

Outperforms previous supervised methods in speech restoration quality

02

Works effectively with real degraded speech data

03

Enables audio effect transfer by extracting and adding distortions

Abstract

We present a self-supervised speech restoration method without paired speech corpora. Because the previous general speech restoration method uses artificial paired data created by applying various distortions to high-quality speech corpora, it cannot sufficiently represent acoustic distortions of real data, limiting the applicability. Our model consists of analysis, synthesis, and channel modules that simulate the recording process of degraded speech and is trained with real degraded speech data in a self-supervised manner. The analysis module extracts distortionless speech features and distortion features from degraded speech, while the synthesis module synthesizes the restored speech waveform, and the channel module adds distortions to the speech waveform. Our model also enables audio effect transfer, in which only acoustic distortions are extracted from degraded speech and added to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

takaaki-saeki/ssl_speech_restoration
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis