Deep Representation Learning in Speech Processing: Challenges, Recent Advances, and Future Trends
Siddique Latif, Rajib Rana, Sara Khalifa, Raja Jurdak, Junaid Qadir,, and Bj\"orn W. Schuller

TL;DR
This paper provides a comprehensive survey of deep representation learning techniques in speech processing, highlighting recent advances, challenges, and future trends across ASR, speaker recognition, and emotion recognition.
Contribution
It uniquely consolidates scattered research on speech representation learning, covering three key areas and emphasizing deep learning's role in automatic feature extraction.
Findings
Representation learning improves speech task performance
Deep learning reduces dependence on manual feature engineering
Survey covers recent advances and future directions
Abstract
Research on speech processing has traditionally considered the task of designing hand-engineered acoustic features (feature engineering) as a separate distinct problem from the task of designing efficient machine learning (ML) models to make prediction and classification decisions. There are two main drawbacks to this approach: firstly, the feature engineering being manual is cumbersome and requires human knowledge; and secondly, the designed features might not be best for the objective at hand. This has motivated the adoption of a recent trend in speech community towards utilisation of representation learning techniques, which can learn an intermediate representation of the input signal automatically that better suits the task at hand and hence lead to improved performance. The significance of representation learning has increased with advances in deep learning (DL), where the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
