Do Vision-Language Foundational models show Robust Visual Perception?

Shivam Chandhok; Pranav Tandon

arXiv:2408.06781·cs.CV·August 14, 2024

Do Vision-Language Foundational models show Robust Visual Perception?

Shivam Chandhok, Pranav Tandon

PDF

Open Access 1 Repo

TL;DR

This paper investigates whether vision-language foundational models maintain robustness under various real-world distribution shifts like noise and weather effects, comparing their generalization capabilities to human perception.

Contribution

It provides a comprehensive analysis of the robustness of diverse vision-language models under multiple distribution shifts, highlighting their strengths and limitations.

Findings

01

Models show varying robustness to different corruptions.

02

Performance degrades significantly under severe shifts.

03

Insights into generalization capabilities of vision-language models.

Abstract

Recent advances in vision-language foundational models have enabled development of systems that can perform visual understanding and reasoning tasks. However, it is unclear if these models are robust to distribution shifts, and how their performance and generalization capabilities vary under changes in data distribution. In this project we strive to answer the question "Are vision-language foundational models robust to distribution shifts like human perception?" Specifically, we consider a diverse range of vision-language models and compare how the performance of these systems is affected by corruption based distribution shifts (such as \textit{motion blur, fog, snow, gaussian noise}) commonly found in practical real-world scenarios. We analyse the generalization capabilities qualitatively and quantitatively on zero-shot image classification task under aforementioned distribution…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shivam-chandhok/cpsc-540-project
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Constraint Satisfaction and Optimization · Categorization, perception, and language