MerkleSpeech: Public-Key Verifiable, Chunk-Localised Speech Provenance via Perceptual Fingerprints and Merkle Commitments
Tatsunori Ono

TL;DR
MerkleSpeech introduces a system for speech provenance that combines perceptual fingerprinting, Merkle commitments, and cryptographic signatures to verify the origin and integrity of speech segments, even after common transformations.
Contribution
It presents a novel two-tiered approach combining watermark-based attribution and cryptographic Merkle proofs for speech provenance verification.
Findings
Robust to resampling, filtering, and noise.
Achieves very low false positive rates.
Supports in-band verification with public information.
Abstract
Speech provenance goes beyond detecting whether a watermark is present. Real workflows involve splicing, quoting, trimming, and platform-level transforms that may preserve some regions while altering others. Neural watermarking systems have made strides in robustness and localised detection, but most deployments produce outputs with no third-party verifiable cryptographic proof tying a time segment to an issuer-signed original. Provenance standards like C2PA adopt signed manifests and Merkle-based fragment validation, yet their bindings target encoded assets and break under re-encoding or routine processing. We propose MerkleSpeech, a system for public-key verifiable, chunk-localised speech provenance offering two tiers of assurance. The first, a robust watermark attribution layer (WM-only), survives common distribution transforms and answers "was this chunk issued by a known party?".…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Scientific Computing and Data Management · Generative Adversarial Networks and Image Synthesis
