Sequencer: Deep LSTM for Image Classification

Yuki Tatsunami; Masato Taki

arXiv:2205.01972·cs.CV·January 13, 2023·54 cites

Sequencer: Deep LSTM for Image Classification

Yuki Tatsunami, Masato Taki

PDF

Open Access 5 Repos 5 Models 1 Video

TL;DR

Sequencer introduces a novel LSTM-based architecture for image classification, rivaling Vision Transformers by modeling long-range dependencies without self-attention, and demonstrates strong performance on ImageNet-1K.

Contribution

This paper presents Sequencer, a new LSTM-based architecture for vision tasks, offering an alternative to self-attention models like ViT with competitive accuracy.

Findings

01

Sequencer2D-L achieves 84.6% top-1 accuracy on ImageNet-1K.

02

The model demonstrates good transferability to other datasets.

03

It maintains robust performance across different input resolutions.

Abstract

In recent computer vision research, the advent of the Vision Transformer (ViT) has rapidly revolutionized various architectural design efforts: ViT achieved state-of-the-art image classification performance using self-attention found in natural language processing, and MLP-Mixer achieved competitive performance using simple multi-layer perceptrons. In contrast, several studies have also suggested that carefully redesigned convolutional neural networks (CNNs) can achieve advanced performance comparable to ViT without resorting to these new ideas. Against this background, there is growing interest in what inductive bias is suitable for computer vision. Here we propose Sequencer, a novel and competitive architecture alternative to ViT that provides a new perspective on these issues. Unlike ViTs, Sequencer models long-range dependencies using LSTMs rather than self-attention layers. We also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

Sequencer: Deep LSTM for Image Classification· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Brain Tumor Detection and Classification

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Average Pooling · Dropout · Global Average Pooling · Sigmoid Activation · Tanh Activation · Dense Connections · Residual Connection · Layer Normalization · Vision Transformer