Non linear time compression of clear and normal speech at high rates

Cassia Valentini-Botinhao; Mirjam Wester; Junichi Yamagishi; Markus; Toman; Michael Pucher; Dietmar Schabus

arXiv:1901.07239·eess.AS·January 23, 2019·1 cites

Non linear time compression of clear and normal speech at high rates

Cassia Valentini-Botinhao, Mirjam Wester, Junichi Yamagishi, Markus, Toman, Michael Pucher, Dietmar Schabus

PDF

Open Access

TL;DR

This study compares various non-linear time compression methods on normal and clear speech, revealing that compressed normal speech generally remains more intelligible than compressed clear speech at high rates, with silence compression further improving clarity.

Contribution

It introduces and evaluates non-linear time compression techniques that better preserve intelligibility, especially by compressing silence more than speech.

Findings

01

Compressed normal speech is more intelligible than compressed clear speech at high rates.

02

Compressing silence more than speech enhances intelligibility with minimal computational cost.

03

Fast speech becomes less intelligible than linearly compressed normal speech at high compression rates.

Abstract

We compare a series of time compression methods applied to normal and clear speech. First we evaluate a linear (uniform) method applied to these styles as well as to naturally-produced fast speech. We found, in line with the literature, that unprocessed fast speech was less intelligible than linearly compressed normal speech. Fast speech was also less intelligible than compressed clear speech but at the highest rate (three times faster than normal) the advantage of clear over fast speech was lost. To test whether this was due to shorter speech duration we evaluate, in our second experiments, a range of methods that compress speech and silence at different rates. We found that even when the overall duration of speech and silence is kept the same across styles, compressed normal speech is still more intelligible than compressed clear speech. Compressing silence twice as much as speech…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Data Compression Techniques