Enforcing Speech Content Privacy in Environmental Sound Recordings using Segment-wise Waveform Reversal
Modan Tailleur, Mathieu Lagrange, Pierre Aumond, Vincent Tourre

TL;DR
This paper presents a waveform segment reversal method combined with voice activity detection to effectively obscure speech in environmental recordings, protecting privacy while maintaining audio quality and scene integrity.
Contribution
The authors introduce a novel waveform reversal approach with a detection pipeline for targeted speech privacy enforcement, demonstrating its effectiveness through comprehensive evaluation.
Findings
97.9% reduction in speech intelligibility (WER)
Minimal impact on sound source detectability (2.7% SCAD drop)
High audio quality preserved (FAD of 1.40)
Abstract
Environmental sound recordings often contain intelligible speech, raising privacy concerns that limit analysis, sharing and reuse of data. In this paper, we introduce a method that renders speech unintelligible while preserving both the integrity of the acoustic scene, and the overall audio quality. Our approach involves reversing waveform segments to distort speech content. This process is enhanced through a voice activity detection and speech separation pipeline, which allows for more precise targeting of speech. In order to demonstrate the effectivness of the proposed approach, we consider a three-part evaluation protocol that assesses: 1) speech intelligibility using Word Error Rate (WER), 2) sound sources detectability using Sound source Classification Accuracy-Drop (SCAD) from a widely used pre-trained model, and 3) audio quality using the Fr\'echet Audio Distance (FAD),…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
