From Text to Sound: A Preliminary Study on Retrieving Sound Effects to Radio Stories
Songwei Ge, Curtis Xuan, Ruihua Song, Chao Zou, Wei Liu, Jin Zhou

TL;DR
This paper presents a retrieval-based framework enhanced with semantic inference for automatically adding sound effects to radio stories, reducing manual labor and improving quality.
Contribution
It introduces a hybrid retrieval and semantic inference model with specialized features and heuristic rules, trained on new crowdsourced datasets for better sound effect integration.
Findings
The proposed model achieves robust retrieval results.
Semantic features improve sound effect accuracy.
The pipeline enhances automatic radio story production.
Abstract
Sound effects play an essential role in producing high-quality radio stories but require enormous labor cost to add. In this paper, we address the problem of automatically adding sound effects to radio stories with a retrieval-based model. However, directly implementing a tag-based retrieval model leads to high false positives due to the ambiguity of story contents. To solve this problem, we introduce a retrieval-based framework hybridized with a semantic inference model which helps to achieve robust retrieval results. Our model relies on fine-designed features extracted from the context of candidate triggers. We collect two story dubbing datasets through crowdsourcing to analyze the setting of adding sound effects and to train and test our proposed methods. We further discuss the importance of each feature and introduce several heuristic rules for the trade-off between precision and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Video Analysis and Summarization
