An Investigation of Visual Foundation Models Robustness

Sandeep Gupta; Roberto Passerone

arXiv:2508.16225·cs.CV·August 25, 2025

An Investigation of Visual Foundation Models Robustness

Sandeep Gupta, Roberto Passerone

PDF

TL;DR

This paper analyzes the robustness of Visual Foundation Models in computer vision, focusing on their ability to handle real-world challenges like environmental variability and adversarial attacks, and reviews existing defense strategies.

Contribution

It provides a comprehensive analysis of robustness requirements, challenges of defense mechanisms, and benchmarking metrics for evaluating Visual Foundation Models in dynamic environments.

Findings

01

Empirical defenses and robust training improve model resilience.

02

Challenges include network properties affecting robustness.

03

Benchmarking metrics are essential for evaluation.

Abstract

Visual Foundation Models (VFMs) are becoming ubiquitous in computer vision, powering systems for diverse tasks such as object detection, image classification, segmentation, pose estimation, and motion tracking. VFMs are capitalizing on seminal innovations in deep learning models, such as LeNet-5, AlexNet, ResNet, VGGNet, InceptionNet, DenseNet, YOLO, and ViT, to deliver superior performance across a range of critical computer vision applications. These include security-sensitive domains like biometric verification, autonomous vehicle perception, and medical image analysis, where robustness is essential to fostering trust between technology and the end-users. This article investigates network robustness requirements crucial in computer vision systems to adapt effectively to dynamic environments influenced by factors such as lighting, weather conditions, and sensor characteristics. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.