FViT: A Focal Vision Transformer with Gabor Filter

Yulong Shi; Mingwei Sun; Yongshuai Wang; Zengqiang Chen

arXiv:2402.11303·cs.CV·January 22, 2025·1 cites

FViT: A Focal Vision Transformer with Gabor Filter

Yulong Shi, Mingwei Sun, Yongshuai Wang, Zengqiang Chen

PDF

Open Access 1 Repo

TL;DR

FViT introduces a novel vision transformer architecture that integrates learnable Gabor filters and biologically inspired modules to improve efficiency and performance in dense vision tasks.

Contribution

This paper proposes FViT, a new vision transformer framework combining Gabor filters and neuroscience-inspired blocks for better feature focus and reduced complexity.

Findings

01

FViT outperforms existing models in multiple vision tasks.

02

FViT demonstrates higher computational efficiency and scalability.

03

The biologically inspired design improves feature discrimination across scales.

Abstract

Vision transformers have achieved encouraging progress in various computer vision tasks. A common belief is that this is attributed to the capability of self-attention in modeling the global dependencies among feature tokens. However, self-attention still faces several challenges in dense prediction tasks, including high computational complexity and absence of desirable inductive bias. To alleviate these issues, the potential advantages of combining vision transformers with Gabor filters are revisited, and a learnable Gabor filter (LGF) using convolution is proposed. The LGF does not rely on self-attention, and it is used to simulate the response of fundamental cells in the biological visual system to the input images. This encourages vision transformers to focus on discriminative feature representations of targets across different scales and orientations. In addition, a Bionic Focal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nkusyl/fvit
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCCD and CMOS Imaging Sensors · Infrared Target Detection Methodologies

MethodsAttention Is All You Need · Convolution · Linear Layer · Softmax · Multi-Head Attention · Layer Normalization · Residual Connection · Dense Connections · Focus · Vision Transformer