Hidden-Markov-Model Based Speech Enhancement

Daniel Dzibela; Armin Sehr

arXiv:1707.01090·cs.SD·July 6, 2017·1 cites

Hidden-Markov-Model Based Speech Enhancement

Daniel Dzibela, Armin Sehr

PDF

Open Access

TL;DR

This paper presents a Hidden Markov Model-based speech enhancement method that synthesizes speech using side information like pitch to improve quality and intelligibility, outperforming text-only synthesis.

Contribution

It introduces a novel approach combining HMM-based synthesis with side information to enhance speech quality and intelligibility in noisy recordings.

Findings

01

Synthesized speech quality improves with side information use.

02

Nearly indistinguishable speech synthesis achieved with pitch data.

03

Models maintain intelligibility despite robotic sound quality.

Abstract

The goal of this contribution is to use a parametric speech synthesis system for reducing background noise and other interferences from recorded speech signals. In a first step, Hidden Markov Models of the synthesis system are trained. Two adequate training corpora consisting of text and corresponding speech files have been set up and cleared of various faults, including inaudible utterances or incorrect assignments between audio and text data. Those are tested and compared against each other regarding e.g. flaws in the synthesized speech, it's naturalness and intelligibility. Thus different voices have been synthesized, whose quality depends less on the number of training samples used, but much more on the cleanliness and signal-to-noise ratio of those. Generalized voice models have been used for synthesis and the results greatly differ between the two speech corpora. Tests…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research