A Survey on Audio Synthesis and Audio-Visual Multimodal Processing

Zhaofeng Shi

arXiv:2108.00443·eess.AS·August 3, 2021·5 cites

A Survey on Audio Synthesis and Audio-Visual Multimodal Processing

Zhaofeng Shi

PDF

Open Access

TL;DR

This survey reviews recent advances in audio synthesis and audio-visual multimodal processing, covering techniques like TTS and music generation, and discusses future research directions in these rapidly evolving fields.

Contribution

It provides a comprehensive classification and analysis of current methods in audio synthesis and multimodal processing, highlighting future development trends.

Findings

01

Classification of technical methods in audio synthesis and multimodal processing

02

Analysis of current research trends and future directions

03

Guidance for researchers in related fields

Abstract

With the development of deep learning and artificial intelligence, audio synthesis has a pivotal role in the area of machine learning and shows strong applicability in the industry. Meanwhile, significant efforts have been dedicated by researchers to handle multimodal tasks at present such as audio-visual multimodal processing. In this paper, we conduct a survey on audio synthesis and audio-visual multimodal processing, which helps understand current research and future trends. This review focuses on text to speech(TTS), music generation and some tasks that combine visual and acoustic information. The corresponding technical methods are comprehensively classified and introduced, and their future development trends are prospected. This survey can provide some guidance for researchers who are interested in the areas like audio synthesis and audio-visual multimodal processing.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis