TTP: Test-Time Padding for Adversarial Detection and Robust Adaptation on Vision-Language Models

Zhiwei Li; Yitian Pang; Weining Wang; Zhenan Sun; and Qi Li

arXiv:2512.16523·cs.CV·March 24, 2026

TTP: Test-Time Padding for Adversarial Detection and Robust Adaptation on Vision-Language Models

Zhiwei Li, Yitian Pang, Weining Wang, Zhenan Sun, and Qi Li

PDF

Open Access

TL;DR

This paper introduces Test-Time Padding (TTP), a lightweight method for detecting and adapting to adversarial attacks on vision-language models like CLIP, improving robustness without sacrificing accuracy.

Contribution

TTP is a novel test-time framework that detects adversarial inputs via feature similarity shifts and employs targeted padding-based adaptation, outperforming existing defenses.

Findings

01

TTP reliably detects adversarial inputs across models and datasets.

02

TTP improves robustness without reducing clean accuracy.

03

Experimental results surpass state-of-the-art defenses.

Abstract

Vision-Language Models (VLMs), such as CLIP, have achieved impressive zero-shot recognition performance but remain highly susceptible to adversarial perturbations, posing significant risks in safety-critical scenarios. Previous training-time defenses rely on adversarial fine-tuning, which requires labeled data and costly retraining, while existing test-time strategies fail to reliably distinguish between clean and adversarial inputs, thereby preventing both adversarial robustness and clean accuracy from reaching their optimum. To address these limitations, we propose Test-Time Padding (TTP), a lightweight defense framework that performs adversarial detection followed by targeted adaptation at inference. TTP identifies adversarial inputs via the cosine similarity shift between CLIP feature embeddings computed before and after spatial padding, yielding a universal threshold for reliable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications