Calibrating Uncertainty for Zero-Shot Adversarial CLIP
Wenjing lu, Zerui Tao, Dongping Zhang, Yuning Qiu, Yang Yang, Qibin Zhao

TL;DR
This paper introduces a novel adversarial fine-tuning method for CLIP that improves uncertainty calibration under adversarial attacks, balancing robustness and zero-shot generalization.
Contribution
It proposes a Dirichlet distribution-based output reparameterization and a unified objective to calibrate uncertainty while enhancing adversarial robustness.
Findings
Restores calibrated uncertainty under adversarial perturbations
Maintains competitive zero-shot classification accuracy
Improves reliability of uncertainty estimates in adversarial settings
Abstract
CLIP delivers strong zero-shot classification but remains highly vulnerable to adversarial attacks. Previous work of adversarial fine-tuning largely focuses on matching the predicted logits between clean and adversarial examples, which overlooks uncertainty calibration and may degrade the zero-shot generalization. A common expectation in reliable uncertainty estimation is that predictive uncertainty should increase as inputs become more difficult or shift away from the training distribution. However, we frequently observe the opposite in the adversarial setting: perturbations not only degrade accuracy but also suppress uncertainty, leading to severe miscalibration and unreliable over-confidence. This overlooked phenomenon highlights a critical reliability gap beyond robustness. To bridge this gap, we propose a novel adversarial fine-tuning objective for CLIP considering both prediction…
Peer Reviews
Decision·Submitted to ICLR 2026
The motivation for improving CLIP's zero-shot robustness is well articulated, and the proposed method is supported by thorough experiments. The authors provide comprehensive evaluations on various datasets and attack methods, demonstrating the effectiveness of their approach. The ablation studies further validate the contributions of different components of the proposed loss function.
1. The choice of concentration parameter alpha for the Dirichlet distribution in Definition 4.1 is not well justified. The authors should provide insights into how this parameter is chosen and its sensitivity to performance. 2. The proposed method shows lower performance on certain datasets (SUN397 and PCAM) as seen in Table 1. The authors should discuss potential reasons for this discrepancy. See the questions.
1.The paper introduces a Dirichlet-based reformulation of CLIP’s logits, which provides a theoretically grounded way to capture both inter-class relationships and predictive confidence. 2.The theoretical analysis and derivations are presented clearly and are easy to follow, making the methodology accessible to readers. 3.The paper is well-structured, with a logical flow from motivation to method to experiments, which helps communicate the ideas effectively. 4.The experiments are extensive, co
1.The paper is motivated by the observation that CLIP can produce overconfident predictions under adversarial attacks, revealing a gap between accuracy and predictive uncertainty. However, this motivation is not sufficiently novel to fully justify the proposed solution. 2.The paper focuses on calibrating uncertainty for zero-shot adversarial CLIP, but it does not clearly explain why the proposed method is specific to CLIP or zero-shot learning. It appears that similar results could be achieved
(1) The paper writes clearly and is easy to follow. (2) Incorporating the Dirichlet parameterization technique to adversarial training is interesting.
(1) Beyond CLIP, the Dirichlet parameterization is a general technique and could also be applied to traditional adversarial training on the image classification task. In the community of adversarial training, there are lots of existing work to improve adversarial robustness, like DKL [ref1], ACAT [ref2], and TRADES. Would it be possible to include experiments on a standard benchmark such as CIFAR-100 or CIFAR-10 under DKL, ACAT, and TRADES setups to demonstrate that the proposed Dirichle
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Ethics and Social Impacts of AI · Explainable Artificial Intelligence (XAI)
