Who Gets Flagged? The Pluralistic Evaluation Gap in AI Content Watermarking
Alexander Nemecek, Osama Zafar, Yuqiao Xu, Wenbiao Li, Erman Ayday

TL;DR
This paper highlights the biases in AI content watermarking across languages, cultures, and demographics, proposing new evaluation standards to ensure fairness and robustness before deployment.
Contribution
It introduces three concrete evaluation dimensions for pluralistic watermark benchmarking addressing content diversity and bias.
Findings
Most watermarking benchmarks lack cross-lingual and cultural performance data.
Watermarking fairness standards are lower than those for generative AI systems.
Proposed evaluation dimensions include cross-lingual detection parity, cultural content coverage, and demographic disaggregation.
Abstract
Watermarking is becoming the default mechanism for AI content authentication, with governance policies and frameworks referencing it as infrastructure for content provenance. Yet across text, image, and audio modalities, watermark signal strength, detectability, and robustness depend on statistical properties of the content itself, properties that vary systematically across languages, cultural visual traditions, and demographic groups. We examine how this content dependence creates modality-specific pathways to bias. Reviewing the major watermarking benchmarks across modalities, we find that, with one exception, none report performance across languages, cultural content types, or population groups. To address this, we propose three concrete evaluation dimensions for pluralistic watermark benchmarking: cross-lingual detection parity, culturally diverse content coverage, and demographic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
