iiANET: Inception Inspired Attention Hybrid Network for efficient Long-Range Dependency

Haruna Yunusa; Adamu Lawan; Abdulganiyu Abdu Yusuf

arXiv:2407.07603·cs.CV·March 31, 2026·1 cites

iiANET: Inception Inspired Attention Hybrid Network for efficient Long-Range Dependency

Haruna Yunusa, Adamu Lawan, Abdulganiyu Abdu Yusuf

PDF

TL;DR

iiANET is a hybrid visual backbone that efficiently combines global self-attention and local convolutional features to better capture long-range dependencies in complex images.

Contribution

The paper introduces iiABlock, a novel building block that integrates modified global r-MHSA and convolutional layers for improved feature extraction.

Findings

01

Achieves state-of-the-art performance on visual recognition benchmarks.

02

Effectively captures both global context and local details.

03

Maintains computational efficiency while enhancing feature modeling.

Abstract

The recent emergence of hybrid models has introduced a transformative approach to computer vision, gradually moving beyond conventional convolutional neural networks and vision transformers. However, efficiently combining these two approaches to better capture long-range dependencies in complex images remains a challenge. In this paper, we present iiANET (Inception Inspired Attention Network), an efficient hybrid visual backbone designed to improve the modeling of long-range dependencies in complex visual recognition tasks. The core innovation of iiANET is the iiABlock, a unified building block that integrates a modified global r-MHSA (Multi-Head Self-Attention) and convolutional layers in parallel. This design enables iiABlock to simultaneously capture global context and local details, making it effective for extracting rich and diverse features. By efficiently fusing these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.