Noise-to-mask Ratio Loss for Deep Neural Network based Audio Watermarking
Martin Moritz, Toni Ol\'an, Tuomas Virtanen

TL;DR
This paper introduces a novel perceptual loss function based on noise-to-mask ratio for deep neural network audio watermarking, improving transparency of watermarks by aligning with human auditory perception.
Contribution
It proposes the NMR loss function for training neural audio watermarking models, enhancing watermark transparency compared to traditional MSE loss.
Findings
Models trained with NMR loss produce more transparent watermarks.
Objective quality measured by PEAQ improves with NMR loss.
Subjective tests (MUSHRA) confirm better perceptual quality with NMR loss.
Abstract
Digital audio watermarking consists in inserting a message into audio signals in a transparent way and can be used to allow automatic recognition of audio material and management of the copyrights. We propose a perceptual loss function to be used in deep neural network based audio watermarking systems. The loss is based on the noise-to-mask ratio (NMR), which is a model of the psychoacoustic masking effect characteristic of the human ear. We use the NMR loss between marked and host signals to train the deep neural models and we evaluate the objective quality with PEAQ and the subjective quality with a MUSHRA test. Both objective and subjective tests show that models trained with NMR loss generate more transparent watermarks than models trained with the conventionally used MSE loss
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Steganography and Watermarking Techniques · Digital Media Forensic Detection · Image and Signal Denoising Methods
