Improving generalization by mimicking the human visual diet

Spandan Madan; You Li; Mengmi Zhang; Hanspeter Pfister; Gabriel; Kreiman

arXiv:2206.07802·cs.CV·January 11, 2024·1 cites

Improving generalization by mimicking the human visual diet

Spandan Madan, You Li, Mengmi Zhang, Hanspeter Pfister, Gabriel, Kreiman

PDF

Open Access 1 Repo

TL;DR

This paper proposes mimicking the human visual diet by training models on diverse, context-rich data to improve their ability to generalize across real-world visual transformations and from synthetic to natural images.

Contribution

It introduces a new dataset and a transformer model designed to emulate the human visual diet, significantly enhancing generalization in computer vision tasks.

Findings

01

Models trained with the human visual diet outperform specialized architectures on natural images.

02

Incorporating scene context and transformations improves robustness to lighting, viewpoint, and material changes.

03

The approach narrows the gap between synthetic and real-world data generalization.

Abstract

We present a new perspective on bridging the generalization gap between biological and computer vision -- mimicking the human visual diet. While computer vision models rely on internet-scraped datasets, humans learn from limited 3D scenes under diverse real-world transformations with objects in natural context. Our results demonstrate that incorporating variations and contextual cues ubiquitous in the human visual training data (visual diet) significantly improves generalization to real-world transformations such as lighting, viewpoint, and material changes. This improvement also extends to generalizing from synthetic to real-world data -- all models trained with a human-like visual diet outperform specialized architectures by large margins when tested on natural image data. These experiments are enabled by our two key contributions: a novel dataset capturing scene context and diverse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

spandan-madan/human_visual_diet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCell Image Analysis Techniques · Human Pose and Action Recognition · Advanced Vision and Imaging

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Residual Connection · Dense Connections · Layer Normalization · Vision Transformer