A Visual Semantic Adaptive Watermark grounded by Prefix-Tuning for Large Vision-Language Model

Qi Zheng; Shuliang Liu; Yu Huang; Sihang Jia; Jungang Li; Lyuhao Chen; Junhao Chen; Hanqian Li; Aiwei Liu; Yibo Yan; Xuming Hu

arXiv:2601.07291·cs.CV·January 13, 2026

A Visual Semantic Adaptive Watermark grounded by Prefix-Tuning for Large Vision-Language Model

Qi Zheng, Shuliang Liu, Yu Huang, Sihang Jia, Jungang Li, Lyuhao Chen, Junhao Chen, Hanqian Li, Aiwei Liu, Yibo Yan, Xuming Hu

PDF

Open Access

TL;DR

This paper introduces VISA-Mark, a novel watermarking framework for large vision-language models that embeds detectable signals aligned with visual evidence, ensuring high visual fidelity, robustness, and efficiency.

Contribution

VISA-Mark employs a lightweight prefix-tuner and adaptive mechanisms to embed semantic-aware watermarks without disrupting visual grounding or incurring high inference latency.

Findings

01

7.8% improvement in visual consistency on Chair-I dataset

02

96.88% detection accuracy (AUC)

03

99.3% robustness against attacks

Abstract

Watermarking has emerged as a pivotal solution for content traceability and intellectual property protection in Large Vision-Language Models (LVLMs). However, vision-agnostic watermarks introduce visually irrelevant tokens and disrupt visual grounding by enforcing indiscriminate pseudo-random biases, while some semantic-aware methods incur prohibitive inference latency due to rejection sampling. In this paper, we propose the VIsual Semantic Adaptive Watermark (VISA-Mark), a novel framework that embeds detectable signals while strictly preserving visual fidelity. Our approach employs a lightweight, efficiently trained prefix-tuner to extract dynamic Visual-Evidence Weights, which quantify the evidentiary support for candidate tokens based on the visual input. These weights guide an adaptive vocabulary partitioning and logits perturbation mechanism, concentrating watermark strength…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection