TIMA: Text-Image Mutual Awareness for Balancing Zero-Shot Adversarial   Robustness and Generalization Ability

Fengji Ma; Li Liu; Hei Victor Cheng

arXiv:2405.17678·cs.CV·May 29, 2024

TIMA: Text-Image Mutual Awareness for Balancing Zero-Shot Adversarial Robustness and Generalization Ability

Fengji Ma, Li Liu, Hei Victor Cheng

PDF

Open Access

TL;DR

This paper introduces TIMA, a novel method that balances zero-shot adversarial robustness and generalization in CLIP models by mutual awareness mechanisms for text and image embeddings, improving robustness against large adversarial attacks.

Contribution

The paper proposes TIMA, a new approach combining image-aware text tuning and text-aware image tuning with knowledge distillation to enhance robustness and preserve generalization in large-scale models.

Findings

01

Improved zero-shot adversarial robustness against large perturbations.

02

Maintained zero-shot generalization capabilities of CLIP.

03

Effective balance between robustness and generalization demonstrated through experiments.

Abstract

This work addresses the challenge of achieving zero-shot adversarial robustness while preserving zero-shot generalization in large-scale foundation models, with a focus on the popular Contrastive Language-Image Pre-training (CLIP). Although foundation models were reported to have exceptional zero-shot generalization, they are highly vulnerable to adversarial perturbations. Existing methods achieve a comparable good tradeoff between zero-shot adversarial robustness and generalization under small adversarial perturbations. However, they fail to achieve a good tradeoff under large adversarial perturbations. To this end, we propose a novel Text-Image Mutual Awareness (TIMA) method that strikes a balance between zero-shot adversarial robustness and generalization. More precisely, we propose an Image-Aware Text (IAT) tuning mechanism that increases the inter-class distance of text embeddings…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning

MethodsFocus · Contrastive Language-Image Pre-training · Knowledge Distillation