VLA-Mark: A cross modal watermark for large vision-language alignment model

Shuliang Liu; Qi Zheng; Jesse Jiaxi Xu; Yibo Yan; Junyan Zhang; He Geng; Aiwei Liu; Peijie Jiang; Jia Liu; Yik-Cheung Tam; and Xuming Hu

arXiv:2507.14067·cs.CV·September 22, 2025

VLA-Mark: A cross modal watermark for large vision-language alignment model

Shuliang Liu, Qi Zheng, Jesse Jiaxi Xu, Yibo Yan, Junyan Zhang, He Geng, Aiwei Liu, Peijie Jiang, Jia Liu, Yik-Cheung Tam, and Xuming Hu

PDF

Open Access

TL;DR

VLA-Mark is a novel cross-modal watermarking framework for vision-language models that embeds detectable watermarks without compromising semantic alignment, achieving high detection accuracy and robustness against attacks.

Contribution

It introduces a cross-modal watermarking method that preserves semantic fidelity and visual-textual coherence without requiring model retraining.

Findings

01

7.4% lower perplexity (PPL) compared to conventional methods

02

26.6% higher BLEU score indicating better language quality

03

98.8% AUC for watermark detection

Abstract

Vision-language models demand watermarking solutions that protect intellectual property without compromising multimodal coherence. Existing text watermarking methods disrupt visual-textual alignment through biased token selection and static strategies, leaving semantic-critical concepts vulnerable. We propose VLA-Mark, a vision-aligned framework that embeds detectable watermarks while preserving semantic fidelity through cross-modal coordination. Our approach integrates multiscale visual-textual alignment metrics, combining localized patch affinity, global semantic coherence, and contextual attention patterns, to guide watermark injection without model retraining. An entropy-sensitive mechanism dynamically balances watermark strength and semantic preservation, prioritizing visual grounding during low-uncertainty generation phases. Experiments show 7.4% lower PPL and 26.6% higher BLEU…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Advanced Malware Detection Techniques