Calibrating Uncertainty for Zero-Shot Adversarial CLIP

Wenjing lu; Zerui Tao; Dongping Zhang; Yuning Qiu; Yang Yang; Qibin Zhao

arXiv:2512.12997·cs.CV·December 16, 2025

Calibrating Uncertainty for Zero-Shot Adversarial CLIP

Wenjing lu, Zerui Tao, Dongping Zhang, Yuning Qiu, Yang Yang, Qibin Zhao

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a novel adversarial fine-tuning method for CLIP that improves uncertainty calibration under adversarial attacks, balancing robustness and zero-shot generalization.

Contribution

It proposes a Dirichlet distribution-based output reparameterization and a unified objective to calibrate uncertainty while enhancing adversarial robustness.

Findings

01

Restores calibrated uncertainty under adversarial perturbations

02

Maintains competitive zero-shot classification accuracy

03

Improves reliability of uncertainty estimates in adversarial settings

Abstract

CLIP delivers strong zero-shot classification but remains highly vulnerable to adversarial attacks. Previous work of adversarial fine-tuning largely focuses on matching the predicted logits between clean and adversarial examples, which overlooks uncertainty calibration and may degrade the zero-shot generalization. A common expectation in reliable uncertainty estimation is that predictive uncertainty should increase as inputs become more difficult or shift away from the training distribution. However, we frequently observe the opposite in the adversarial setting: perturbations not only degrade accuracy but also suppress uncertainty, leading to severe miscalibration and unreliable over-confidence. This overlooked phenomenon highlights a critical reliability gap beyond robustness. To bridge this gap, we propose a novel adversarial fine-tuning objective for CLIP considering both prediction…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 4

Strengths

The motivation for improving CLIP's zero-shot robustness is well articulated, and the proposed method is supported by thorough experiments. The authors provide comprehensive evaluations on various datasets and attack methods, demonstrating the effectiveness of their approach. The ablation studies further validate the contributions of different components of the proposed loss function.

Weaknesses

1. The choice of concentration parameter alpha for the Dirichlet distribution in Definition 4.1 is not well justified. The authors should provide insights into how this parameter is chosen and its sensitivity to performance. 2. The proposed method shows lower performance on certain datasets (SUN397 and PCAM) as seen in Table 1. The authors should discuss potential reasons for this discrepancy. See the questions.

Reviewer 02Rating 2Confidence 4

Strengths

1.The paper introduces a Dirichlet-based reformulation of CLIP’s logits, which provides a theoretically grounded way to capture both inter-class relationships and predictive confidence. 2.The theoretical analysis and derivations are presented clearly and are easy to follow, making the methodology accessible to readers. 3.The paper is well-structured, with a logical flow from motivation to method to experiments, which helps communicate the ideas effectively. 4.The experiments are extensive, co

Weaknesses

1.The paper is motivated by the observation that CLIP can produce overconfident predictions under adversarial attacks, revealing a gap between accuracy and predictive uncertainty. However, this motivation is not sufficiently novel to fully justify the proposed solution. 2.The paper focuses on calibrating uncertainty for zero-shot adversarial CLIP, but it does not clearly explain why the proposed method is specific to CLIP or zero-shot learning. It appears that similar results could be achieved

Reviewer 03Rating 4Confidence 4

Strengths

(1) The paper writes clearly and is easy to follow. (2) Incorporating the Dirichlet parameterization technique to adversarial training is interesting.

Weaknesses

(1) Beyond CLIP, the Dirichlet parameterization is a general technique and could also be applied to traditional adversarial training on the image classification task. In the community of adversarial training, there are lots of existing work to improve adversarial robustness, like DKL [ref1], ACAT [ref2], and TRADES. Would it be possible to include experiments on a standard benchmark such as CIFAR-100 or CIFAR-10 under DKL, ACAT, and TRADES setups to demonstrate that the proposed Dirichle

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Ethics and Social Impacts of AI · Explainable Artificial Intelligence (XAI)