Overview of the Amphion Toolkit (v0.2)
Jiaqi Li, Xueyao Zhang, Yuancheng Wang, Haorui He, Chaoren Wang, Li, Wang, Huan Liao, Junyi Ao, Zeyu Xie, Yiqiao Huang, Junan Zhang, Zhizheng Wu

TL;DR
Amphion v0.2 is an open-source toolkit that simplifies audio, music, and speech generation tasks by providing diverse models, a large multilingual dataset, and comprehensive tutorials for researchers and engineers.
Contribution
This paper introduces Amphion v0.2, featuring a large multilingual dataset, new models, and tutorials, enhancing accessibility and usability for audio generation research.
Findings
100K-hour multilingual dataset available
New models for TTS, audio coding, and voice conversion
Comprehensive tutorials for user guidance
Abstract
Amphion is an open-source toolkit for Audio, Music, and Speech Generation, designed to lower the entry barrier for junior researchers and engineers in these fields. It provides a versatile framework that supports a variety of generation tasks and models. In this report, we introduce Amphion v0.2, the second major release developed in 2024. This release features a 100K-hour open-source multilingual dataset, a robust data preparation pipeline, and novel models for tasks such as text-to-speech, audio coding, and voice conversion. Furthermore, the report includes multiple tutorials that guide users through the functionalities and usage of the newly released models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParticle physics theoretical and experimental studies · Quantum Chromodynamics and Particle Interactions · Superconducting Materials and Applications
