InfiniteAudio: Infinite-Length Audio Generation with Consistency

Chaeyoung Jung; Hojoon Ki; Ji-Hoon Kim; Junmo Kim; Joon Son Chung

arXiv:2506.03020·eess.AS·June 4, 2025·Interspeech

InfiniteAudio: Infinite-Length Audio Generation with Consistency

Chaeyoung Jung, Hojoon Ki, Ji-Hoon Kim, Junmo Kim, Joon Son Chung

PDF

Open Access

TL;DR

InfiniteAudio enables the generation of seamless, infinite-length audio by integrating a novel inference strategy and selective denoising into existing diffusion-based text-to-audio models, overcoming memory constraints and inconsistency issues.

Contribution

It introduces InfiniteAudio, a method combining FIFO sampling and curved denoising to produce continuous audio without additional training, addressing key challenges in long-duration audio synthesis.

Findings

01

Achieves high-quality infinite-length audio generation.

02

Maintains consistency across generated audio segments.

03

Outperforms existing methods on multiple metrics.

Abstract

This paper presents InfiniteAudio, a simple yet effective strategy for generating infinite-length audio using diffusion-based text-to-audio methods. Current approaches face memory constraints because the output size increases with input length, making long duration generation challenging. A common workaround is to concatenate short audio segments, but this often leads to inconsistencies due to the lack of shared temporal context. To address this, InfiniteAudio integrates seamlessly into existing pipelines without additional training. It introduces two key techniques: FIFO sampling, a first-in, first-out inference strategy with fixed-size inputs, and curved denoising, which selectively prioritizes key diffusion steps for efficiency. Experiments show that InfiniteAudio achieves comparable or superior performance across all metrics. Audio samples are available on our project page.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic Technology and Sound Studies · Speech and Audio Processing · Music and Audio Processing

MethodsDiffusion