Robustness of Vision Foundation Models to Common Perturbations
Hongbin Liu, Zhengyuan Jiang, Cheng Hong, Neil Zhenqiang Gong

TL;DR
This paper systematically evaluates the robustness of vision foundation models to common image perturbations, revealing general non-robustness and proposing metrics and fine-tuning methods to improve resilience.
Contribution
It introduces three robustness metrics, analyzes their properties, evaluates six industry models, and proposes a fine-tuning approach to enhance robustness.
Findings
Models are generally non-robust to common perturbations.
Perturbations degrade downstream task performance.
Robustness metrics can predict performance impacts.
Abstract
A vision foundation model outputs an embedding vector for an image, which can be affected by common editing operations (e.g., JPEG compression, brightness, contrast adjustments). These common perturbations alter embedding vectors and may impact the performance of downstream tasks using these embeddings. In this work, we present the first systematic study on foundation models' robustness to such perturbations. We propose three robustness metrics and formulate five desired mathematical properties for these metrics, analyzing which properties they satisfy or violate. Using these metrics, we evaluate six industry-scale foundation models (OpenAI, Meta) across nine common perturbation categories, finding them generally non-robust. We also show that common perturbations degrade downstream application performance (e.g., classification accuracy) and that robustness values can predict performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
