StreamMark: A Deep Learning-Based Semi-Fragile Audio Watermarking for Proactive Deepfake Detection

Zhentao Liu; Milos Cernak

arXiv:2604.11917·eess.AS·April 15, 2026

StreamMark: A Deep Learning-Based Semi-Fragile Audio Watermarking for Proactive Deepfake Detection

Zhentao Liu, Milos Cernak

PDF

TL;DR

StreamMark is a deep learning-based semi-fragile audio watermarking system designed to detect deepfakes by being robust to benign transformations and fragile to malicious manipulations.

Contribution

It introduces a novel encoder-distortion-decoder architecture with complex-domain embedding for effective deepfake detection.

Findings

01

Achieves high imperceptibility with SNR 24.16 dB and PESQ 4.20.

02

Resilient to real-world distortions like Opus encoding.

03

Fragile to deepfake manipulations, with message recovery near chance levels (~50%).

Abstract

The rapid advancement of generative AI has made it increasingly challenging to distinguish between deepfake audio and authentic human speech. To overcome the limitations of passive detection methods, we propose StreamMark, a novel deep learning-based, semi-fragile audio watermarking system. StreamMark is designed to be robust against benign audio conversions that preserve semantic meaning (e.g., compression, noise) while remaining fragile to malicious, semantics-altering manipulations (e.g., voice conversion, speech editing). Our method introduces a complex-domain embedding technique within a unique Encoder-Distortion-Decoder architecture, trained explicitly to differentiate between these two classes of transformations. Comprehensive benchmarks demonstrate that StreamMark achieves high imperceptibility (SNR 24.16 dB, PESQ 4.20), is resilient to real-world distortions like Opus encoding,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.