PROSE: Perceptual Risk Optimization for Speech Enhancement

Jishnu Sadasivan; Chandra Sekhar Seelamantula; and Nagarjuna Reddy; Muraka

arXiv:1710.03975·eess.AS·October 12, 2017

PROSE: Perceptual Risk Optimization for Speech Enhancement

Jishnu Sadasivan, Chandra Sekhar Seelamantula, and Nagarjuna Reddy, Muraka

PDF

Open Access

TL;DR

This paper introduces a new risk minimization framework for speech enhancement that optimizes an unbiased estimate of distortion measures directly from noisy observations, improving denoising performance especially at higher SNRs.

Contribution

It develops a novel risk estimation approach for speech enhancement that does not require prior knowledge of clean speech statistics, using perceptually relevant distortion measures.

Findings

01

Outperforms traditional methods like Wiener filter and log-MMSE at SNRs above 5 dB.

02

Uses perceptual distortion measures such as Itakura-Saito and weighted hyperbolic cosine.

03

Achieves better speech quality and intelligibility in evaluations.

Abstract

The goal in speech enhancement is to obtain an estimate of clean speech starting from the noisy signal by minimizing a chosen distortion measure, which results in an estimate that depends on the unknown clean signal or its statistics. Since access to such prior knowledge is limited or not possible in practice, one has to estimate the clean signal statistics. In this paper, we develop a new risk minimization framework for speech enhancement, in which, one optimizes an unbiased estimate of the distortion/risk instead of the actual risk. The estimated risk is expressed solely as a function of the noisy observations. We consider several perceptually relevant distortion measures and develop corresponding unbiased estimates under realistic assumptions on the noise distribution and a priori signal-to-noise ratio (SNR). Minimizing the risk estimates gives rise to the corresponding denoisers,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Hearing Loss and Rehabilitation