Does language help generalization in vision models?

Benjamin Devillers; Bhavin Choksi; Romain Bielawski; Rufin; VanRullen

arXiv:2104.08313·cs.AI·September 16, 2021

Does language help generalization in vision models?

Benjamin Devillers, Bhavin Choksi, Romain Bielawski, Rufin, VanRullen

PDF

1 Repo

TL;DR

This paper systematically evaluates whether incorporating language in vision models enhances their generalization, finding that multimodal training does not outperform standard visual training in various tasks.

Contribution

The study provides a comprehensive comparison of multimodal versus vision-only models, showing that current multimodal approaches do not improve generalization capabilities.

Findings

01

Multimodal training does not outperform vision-only training in clustering, few-shot, transfer, or robustness tasks.

02

Semantic grounding alone does not enhance vision model generalization.

03

Further work is needed to leverage language for better vision model performance.

Abstract

Vision models trained on multimodal datasets can benefit from the wide availability of large image-caption datasets. A recent model (CLIP) was found to generalize well in zero-shot and transfer learning settings. This could imply that linguistic or "semantic grounding" confers additional generalization abilities to the visual feature space. Here, we systematically evaluate various multimodal architectures and vision-only models in terms of unsupervised clustering, few-shot learning, transfer learning and adversarial robustness. In each setting, multimodal training produced no additional generalization capability compared to standard supervised visual training. We conclude that work is still required for semantic grounding to help improve vision models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bdvllrs/generalization-vision
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.