Deep Representation Learning in Speech Processing: Challenges, Recent   Advances, and Future Trends

Siddique Latif; Rajib Rana; Sara Khalifa; Raja Jurdak; Junaid Qadir,; and Bj\"orn W. Schuller

arXiv:2001.00378·cs.SD·September 27, 2021·67 cites

Deep Representation Learning in Speech Processing: Challenges, Recent Advances, and Future Trends

Siddique Latif, Rajib Rana, Sara Khalifa, Raja Jurdak, Junaid Qadir,, and Bj\"orn W. Schuller

PDF

Open Access

TL;DR

This paper provides a comprehensive survey of deep representation learning techniques in speech processing, highlighting recent advances, challenges, and future trends across ASR, speaker recognition, and emotion recognition.

Contribution

It uniquely consolidates scattered research on speech representation learning, covering three key areas and emphasizing deep learning's role in automatic feature extraction.

Findings

01

Representation learning improves speech task performance

02

Deep learning reduces dependence on manual feature engineering

03

Survey covers recent advances and future directions

Abstract

Research on speech processing has traditionally considered the task of designing hand-engineered acoustic features (feature engineering) as a separate distinct problem from the task of designing efficient machine learning (ML) models to make prediction and classification decisions. There are two main drawbacks to this approach: firstly, the feature engineering being manual is cumbersome and requires human knowledge; and secondly, the designed features might not be best for the objective at hand. This has motivated the adoption of a recent trend in speech community towards utilisation of representation learning techniques, which can learn an intermediate representation of the input signal automatically that better suits the task at hand and hence lead to improved performance. The significance of representation learning has increased with advances in deep learning (DL), where the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing