VFA: Vision Frequency Analysis of Foundation Models and Human

Mohammad-Javad Darvishi-Bayazi; Md Rifat Arefin; Jocelyn Faubert,; Irina Rish

arXiv:2409.05817·cs.CV·September 10, 2024

VFA: Vision Frequency Analysis of Foundation Models and Human

Mohammad-Javad Darvishi-Bayazi, Md Rifat Arefin, Jocelyn Faubert,, Irina Rish

PDF

Open Access 1 Repo

TL;DR

This paper explores how large-scale vision models can be aligned with human perception to improve robustness against distribution shifts, highlighting the impact of model size, data richness, and multimodal features.

Contribution

It introduces a comprehensive analysis of factors influencing model-human alignment and robustness, emphasizing the importance of size, semantic richness, and multimodal data.

Findings

01

Larger models and datasets improve alignment with human perception.

02

Rich semantic information enhances model robustness.

03

Multimodal models show better out-of-distribution performance.

Abstract

Machine learning models often struggle with distribution shifts in real-world scenarios, whereas humans exhibit robust adaptation. Models that better align with human perception may achieve higher out-of-distribution generalization. In this study, we investigate how various characteristics of large-scale computer vision models influence their alignment with human capabilities and robustness. Our findings indicate that increasing model and data size and incorporating rich semantic information and multiple modalities enhance models' alignment with human perception and their overall robustness. Our empirical analysis demonstrates a strong correlation between out-of-distribution accuracy and human alignment.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MohammadJavadD/vfa
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Surveying and Cultural Heritage

MethodsALIGN