SonicRAG : High Fidelity Sound Effects Synthesis Based on Retrival Augmented Generation

Yu-Ren Guo; Wen-Kai Tai

arXiv:2505.03244·cs.SD·May 14, 2025

SonicRAG : High Fidelity Sound Effects Synthesis Based on Retrival Augmented Generation

Yu-Ren Guo, Wen-Kai Tai

PDF

Open Access

TL;DR

SonicRAG introduces a retrieval-augmented framework combining large language models with sound effect databases to improve the diversity and quality of high-fidelity sound effects synthesis without extra recording costs.

Contribution

The paper presents a novel retrieval-augmented sound effects synthesis framework that enhances audio quality and diversity by integrating LLMs with sound effect databases.

Findings

01

Improved sound effect diversity and quality.

02

Elimination of additional recording costs.

03

Flexible and efficient sound design process.

Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language processing (NLP) and multimodal learning, with successful applications in text generation and speech synthesis, enabling a deeper understanding and generation of multimodal content. In the field of sound effects (SFX) generation, LLMs have been leveraged to orchestrate multiple models for audio synthesis. However, due to the scarcity of annotated datasets, and the complexity of temproal modeling. current SFX generation techniques still fall short in achieving high-fidelity audio. To address these limitations, this paper introduces a novel framework that integrates LLMs with existing sound effect databases, allowing for the retrieval, recombination, and synthesis of audio based on user requirements. By leveraging this approach, we enhance the diversity and quality of generated sound effects while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Music Technology and Sound Studies