Multi-Dimensional Hyena for Spatial Inductive Bias

Itamar Zimerman; Lior Wolf

arXiv:2309.13600·cs.CV·September 26, 2023·1 cites

Multi-Dimensional Hyena for Spatial Inductive Bias

Itamar Zimerman, Lior Wolf

PDF

Open Access

TL;DR

This paper introduces a data-efficient vision transformer using a novel Hyena N-D layer that enhances performance without relying on self-attention, especially effective on small datasets and when combined with traditional attention layers.

Contribution

The paper proposes a new Hyena N-D layer for vision transformers, improving data efficiency and performance without self-attention, and explores hybrid models combining Hyena and attention mechanisms.

Findings

01

Hyena N-D layer improves various Vision Transformer architectures.

02

Hyena-based ViT outperforms recent small dataset models.

03

Hybrid Hyena-attention models boost overall performance.

Abstract

In recent years, Vision Transformers have attracted increasing interest from computer vision researchers. However, the advantage of these transformers over CNNs is only fully manifested when trained over a large dataset, mainly due to the reduced inductive bias towards spatial locality within the transformer's self-attention mechanism. In this work, we present a data-efficient vision transformer that does not rely on self-attention. Instead, it employs a novel generalization to multiple axes of the very recent Hyena layer. We propose several alternative approaches for obtaining this generalization and delve into their unique distinctions and considerations from both empirical and theoretical perspectives. Our empirical findings indicate that the proposed Hyena N-D layer boosts the performance of various Vision Transformer architectures, such as ViT, Swin, and DeiT across multiple…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Advanced Neural Network Applications · CCD and CMOS Imaging Sensors

MethodsMulti-Head Attention · Attention Is All You Need · Attention Dropout · Feedforward Network · Data-efficient Image Transformer · Layer Normalization · Label Smoothing · Dropout · Byte Pair Encoding · Absolute Position Encodings