# Deep Learning for Audio Signal Processing

**Authors:** Hendrik Purwins (1), Bo Li (2), Tuomas Virtanen (3), Jan Schl\"uter (4, and 5), Shuo-yiin Chang (2), Tara Sainath (2) ((1) Aalborg University, Copenhagen, (2) Google, (3) Tampere University, (4) Universit\'e de Toulon,, (5) Austrian Research Institute for Artificial Intelligence)

arXiv: 1905.00078 · 2019-05-28

## TL;DR

This paper reviews recent deep learning techniques applied to audio signal processing, covering speech, music, and environmental sounds, highlighting models, applications, and future challenges in the field.

## Contribution

It provides a comprehensive overview of deep learning methods and applications across various audio domains, emphasizing cross-domain insights and future research directions.

## Key findings

- Deep learning models like CNNs and LSTMs are dominant in audio processing.
- Applications include speech recognition, music retrieval, and sound detection.
- Key issues for future research are identified.

## Abstract

Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered side-by-side, in order to point out similarities and differences between the domains, highlighting general methods, problems, key references, and potential for cross-fertilization between areas. The dominant feature representations (in particular, log-mel spectra and raw waveform) and deep learning models are reviewed, including convolutional neural networks, variants of the long short-term memory architecture, as well as more audio-specific neural network models. Subsequently, prominent deep learning application areas are covered, i.e. audio recognition (automatic speech recognition, music information retrieval, environmental sound detection, localization and tracking) and synthesis and transformation (source separation, audio enhancement, generative models for speech, sound, and music synthesis). Finally, key issues and future questions regarding deep learning applied to audio signal processing are identified.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.00078/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/1905.00078/full.md

## References

153 references — full list in the complete paper: https://tomesphere.com/paper/1905.00078/full.md

---
Source: https://tomesphere.com/paper/1905.00078