EVA-GAN: Enhanced Various Audio Generation via Scalable Generative   Adversarial Networks

Shijia Liao; Shiyi Lan; Arun George Zachariah

arXiv:2402.00892·cs.SD·February 5, 2024·2 cites

EVA-GAN: Enhanced Various Audio Generation via Scalable Generative Adversarial Networks

Shijia Liao, Shiyi Lan, Arun George Zachariah

PDF

Open Access 1 Repo 2 Models

TL;DR

EVA-GAN introduces a scalable, high-fidelity audio generation model that significantly improves spectral quality, high-frequency reconstruction, and robustness, enabling realistic 44.1kHz audio synthesis from large datasets.

Contribution

The paper presents EVA-GAN, a novel scalable GAN architecture for high-fidelity audio generation, addressing spectral discontinuities and out-of-domain robustness with extensive dataset training.

Findings

01

Outperforms previous models in spectral and high-frequency quality

02

Achieves robust performance on out-of-domain audio data

03

Enables high-quality 44.1kHz audio synthesis

Abstract

The advent of Large Models marks a new era in machine learning, significantly outperforming smaller models by leveraging vast datasets to capture and synthesize complex patterns. Despite these advancements, the exploration into scaling, especially in the audio generation domain, remains limited, with previous efforts didn't extend into the high-fidelity (HiFi) 44.1kHz domain and suffering from both spectral discontinuities and blurriness in the high-frequency domain, alongside a lack of robustness against out-of-domain data. These limitations restrict the applicability of models to diverse use cases, including music and singing generation. Our work introduces Enhanced Various Audio Generation via Scalable Generative Adversarial Networks (EVA-GAN), yields significant improvements over previous state-of-the-art in spectral and high-frequency reconstruction and robustness in out-of-domain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fishaudio/vocoder
pytorch

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis