Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves

Sora Takashima; Ryo Hayamizu; Nakamasa Inoue; Hirokatsu Kataoka; Rio; Yokota

arXiv:2303.01112·cs.CV·March 3, 2023·1 cites

Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves

Sora Takashima, Ryo Hayamizu, Nakamasa Inoue, Hirokatsu Kataoka, Rio, Yokota

PDF

Open Access

TL;DR

This paper introduces VisualAtoms, a synthetic dataset generated through a systematic methodology using circular harmonics, which effectively pre-trains vision transformers achieving near state-of-the-art accuracy with fewer images.

Contribution

The work develops a novel systematic approach for designing contour-oriented synthetic datasets, significantly improving pre-training effectiveness for vision transformers.

Findings

01

VisualAtom-21k achieves 83.7% top-1 accuracy on ImageNet-1k.

02

Synthetic dataset quality can continue to improve over time.

03

FDSL with synthetic data addresses privacy and labeling issues.

Abstract

Formula-driven supervised learning (FDSL) has been shown to be an effective method for pre-training vision transformers, where ExFractalDB-21k was shown to exceed the pre-training effect of ImageNet-21k. These studies also indicate that contours mattered more than textures when pre-training vision transformers. However, the lack of a systematic investigation as to why these contour-oriented synthetic datasets can achieve the same accuracy as real datasets leaves much room for skepticism. In the present work, we develop a novel methodology based on circular harmonics for systematically investigating the design space of contour-oriented synthetic datasets. This allows us to efficiently search the optimal range of FDSL parameters and maximize the variety of synthetic images in the dataset, which we found to be a critical factor. When the resulting new dataset VisualAtom-21k is used for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Cell Image Analysis Techniques · Domain Adaptation and Few-Shot Learning