Conviformers: Convolutionally guided Vision Transformer

Mohit Vaishnav; Thomas Fel; Iva\'n Felipe Rodr\'iguez; Thomas Serre

arXiv:2208.08900·cs.CV·August 31, 2022·1 cites

Conviformers: Convolutionally guided Vision Transformer

Mohit Vaishnav, Thomas Fel, Iva\'n Felipe Rodr\'iguez, Thomas Serre

PDF

Open Access 1 Repo

TL;DR

This paper introduces Conviformer, a convolutionally guided Vision Transformer that effectively handles high-resolution images for fine-grained plant classification, achieving state-of-the-art results with improved preprocessing and augmentation techniques.

Contribution

The paper presents Conviformer, a novel convolutional transformer architecture capable of processing higher resolution images efficiently for fine-grained classification tasks.

Findings

01

Conviformer outperforms existing models on Herbarium 202x and iNaturalist 2019 datasets.

02

PreSizer improves image resizing, preserving aspect ratios for better classification.

03

Enhanced augmentation techniques contribute to higher accuracy.

Abstract

Vision transformers are nowadays the de-facto choice for image classification tasks. There are two broad categories of classification tasks, fine-grained and coarse-grained. In fine-grained classification, the necessity is to discover subtle differences due to the high level of similarity between sub-classes. Such distinctions are often lost as we downscale the image to save the memory and computational cost associated with vision transformers (ViT). In this work, we present an in-depth analysis and describe the critical components for developing a system for the fine-grained categorization of plants from herbarium sheets. Our extensive experimental analysis indicated the need for a better augmentation technique and the ability of modern-day neural networks to handle higher dimensional images. We also introduce a convolutional transformer architecture called Conviformer which, unlike…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vaishnavmohit/Conviformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSmart Agriculture and AI · Advanced Neural Network Applications · Innovations in Aquaponics and Hydroponics Systems

MethodsAttention Is All You Need · Linear Layer · Gated Positional Self-Attention · Dense Connections · ConViT · Label Smoothing · Position-Wise Feed-Forward Layer · Residual Connection · Softmax · Dropout