VoxEffects: A Speech-Oriented Audio Effects Dataset and Benchmark

Zhe Zhang; Yigitcan \"Ozer; Junichi Yamagishi

arXiv:2604.12389·eess.AS·April 15, 2026

VoxEffects: A Speech-Oriented Audio Effects Dataset and Benchmark

Zhe Zhang, Yigitcan \"Ozer, Junichi Yamagishi

PDF

TL;DR

VoxEffects is a new speech audio effects dataset and benchmark designed to facilitate systematic study of post-production effects, enabling effect identification and robustness evaluation.

Contribution

It introduces a comprehensive dataset with effect annotations and an evaluation benchmark for speech-oriented audio effects analysis.

Findings

01

Baseline performance established with AudioMAE-based multi-task model.

02

Analysis of domain shift, robustness, input duration, and gender fairness.

03

Dataset supports offline synthesis and real-time rendering for training and evaluation.

Abstract

Speech audio in the wild is often processed by post-production effects, but existing speech datasets rarely provide precise annotations of effects and parameters, limiting systematic study. We introduce VoxEffects, a speech audio effects dataset that pairs produced speech with exact effect-chain supervision at multiple granularities. VoxEffects supports speech-oriented audio effect identification: given a produced waveform, infer which effects are present and how they are applied. Built from minimally edited clean speech, it provides an extensible rendering pipeline for both offline synthesis and on-the-fly rendering for efficient training and evaluation. The audio effect identification benchmark includes effect presence detection, preset classification, and intensity prediction, with a robustness protocol covering capture-side and platform-side degradations. We provide an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.