StreamMark: A Deep Learning-Based Semi-Fragile Audio Watermarking for Proactive Deepfake Detection
Zhentao Liu, Milos Cernak

TL;DR
StreamMark is a deep learning-based semi-fragile audio watermarking system designed to detect deepfakes by being robust to benign transformations and fragile to malicious manipulations.
Contribution
It introduces a novel encoder-distortion-decoder architecture with complex-domain embedding for effective deepfake detection.
Findings
Achieves high imperceptibility with SNR 24.16 dB and PESQ 4.20.
Resilient to real-world distortions like Opus encoding.
Fragile to deepfake manipulations, with message recovery near chance levels (~50%).
Abstract
The rapid advancement of generative AI has made it increasingly challenging to distinguish between deepfake audio and authentic human speech. To overcome the limitations of passive detection methods, we propose StreamMark, a novel deep learning-based, semi-fragile audio watermarking system. StreamMark is designed to be robust against benign audio conversions that preserve semantic meaning (e.g., compression, noise) while remaining fragile to malicious, semantics-altering manipulations (e.g., voice conversion, speech editing). Our method introduces a complex-domain embedding technique within a unique Encoder-Distortion-Decoder architecture, trained explicitly to differentiate between these two classes of transformations. Comprehensive benchmarks demonstrate that StreamMark achieves high imperceptibility (SNR 24.16 dB, PESQ 4.20), is resilient to real-world distortions like Opus encoding,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
