TIMA: Text-Image Mutual Awareness for Balancing Zero-Shot Adversarial Robustness and Generalization Ability
Fengji Ma, Li Liu, Hei Victor Cheng

TL;DR
This paper introduces TIMA, a novel method that balances zero-shot adversarial robustness and generalization in CLIP models by mutual awareness mechanisms for text and image embeddings, improving robustness against large adversarial attacks.
Contribution
The paper proposes TIMA, a new approach combining image-aware text tuning and text-aware image tuning with knowledge distillation to enhance robustness and preserve generalization in large-scale models.
Findings
Improved zero-shot adversarial robustness against large perturbations.
Maintained zero-shot generalization capabilities of CLIP.
Effective balance between robustness and generalization demonstrated through experiments.
Abstract
This work addresses the challenge of achieving zero-shot adversarial robustness while preserving zero-shot generalization in large-scale foundation models, with a focus on the popular Contrastive Language-Image Pre-training (CLIP). Although foundation models were reported to have exceptional zero-shot generalization, they are highly vulnerable to adversarial perturbations. Existing methods achieve a comparable good tradeoff between zero-shot adversarial robustness and generalization under small adversarial perturbations. However, they fail to achieve a good tradeoff under large adversarial perturbations. To this end, we propose a novel Text-Image Mutual Awareness (TIMA) method that strikes a balance between zero-shot adversarial robustness and generalization. More precisely, we propose an Image-Aware Text (IAT) tuning mechanism that increases the inter-class distance of text embeddings…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
MethodsFocus · Contrastive Language-Image Pre-training · Knowledge Distillation
