Cognitive Coding of Speech

Reza Lotfidereshgi; Philippe Gournay

arXiv:2110.04241·eess.AS·October 11, 2021

Cognitive Coding of Speech

Reza Lotfidereshgi, Philippe Gournay

PDF

Open Access

TL;DR

This paper introduces a hierarchical neural network approach for unsupervised cognitive coding of speech, capturing different speech attributes at multiple time scales, with applications in speech compression.

Contribution

It presents a novel two-stage neural network model that hierarchically encodes speech attributes at different temporal resolutions, improving predictive capability and compression performance.

Findings

01

Performance exceeds state-of-the-art on LibriSpeech and EmoV-DB datasets.

02

Effective in capturing phoneme, speaker, and emotion attributes.

03

Robust to dimensionality reduction and low bitrate quantization.

Abstract

We propose an approach for cognitive coding of speech by unsupervised extraction of contextual representations in two hierarchical levels of abstraction. Speech attributes such as phoneme identity that last one hundred milliseconds or less are captured in the lower level of abstraction, while speech attributes such as speaker identity and emotion that persist up to one second are captured in the higher level of abstraction. This decomposition is achieved by a two-stage neural network, with a lower and an upper stage operating at different time scales. Both stages are trained to predict the content of the signal in their respective latent spaces. A top-down pathway between stages further improves the predictive capability of the network. With an application in speech compression in mind, we investigate the effect of dimensionality reduction and low bitrate quantization on the extracted…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Advanced Data Compression Techniques