Enforcing Speech Content Privacy in Environmental Sound Recordings using Segment-wise Waveform Reversal

Modan Tailleur; Mathieu Lagrange; Pierre Aumond; Vincent Tourre

arXiv:2507.08412·cs.SD·July 14, 2025

Enforcing Speech Content Privacy in Environmental Sound Recordings using Segment-wise Waveform Reversal

Modan Tailleur, Mathieu Lagrange, Pierre Aumond, Vincent Tourre

PDF

TL;DR

This paper presents a waveform segment reversal method combined with voice activity detection to effectively obscure speech in environmental recordings, protecting privacy while maintaining audio quality and scene integrity.

Contribution

The authors introduce a novel waveform reversal approach with a detection pipeline for targeted speech privacy enforcement, demonstrating its effectiveness through comprehensive evaluation.

Findings

01

97.9% reduction in speech intelligibility (WER)

02

Minimal impact on sound source detectability (2.7% SCAD drop)

03

High audio quality preserved (FAD of 1.40)

Abstract

Environmental sound recordings often contain intelligible speech, raising privacy concerns that limit analysis, sharing and reuse of data. In this paper, we introduce a method that renders speech unintelligible while preserving both the integrity of the acoustic scene, and the overall audio quality. Our approach involves reversing waveform segments to distort speech content. This process is enhanced through a voice activity detection and speech separation pipeline, which allows for more precise targeting of speech. In order to demonstrate the effectivness of the proposed approach, we consider a three-part evaluation protocol that assesses: 1) speech intelligibility using Word Error Rate (WER), 2) sound sources detectability using Sound source Classification Accuracy-Drop (SCAD) from a widely used pre-trained model, and 3) audio quality using the Fr\'echet Audio Distance (FAD),…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.