GradAttn: Replacing Fixed Residual Connections with Task-Modulated Attention Pathways

Soudeep Ghoshal; Himanshu Buckchash

arXiv:2603.26756·cs.CV·March 31, 2026

GradAttn: Replacing Fixed Residual Connections with Task-Modulated Attention Pathways

Soudeep Ghoshal, Himanshu Buckchash

PDF

TL;DR

GradAttn introduces a hybrid CNN-transformer model that replaces fixed residuals with attention-controlled gradient flow, enhancing feature learning and accuracy across diverse datasets.

Contribution

It proposes a novel framework that dynamically modulates gradient flow using attention, outperforming traditional residual connections in deep CNNs.

Findings

01

GradAttn outperforms ResNet-18 on five of eight datasets.

02

Achieves up to +11.07% accuracy improvement on FashionMNIST.

03

Attention-controlled gradient flow can improve generalization despite controlled instabilities.

Abstract

Deep ConvNets suffer from gradient signal degradation as network depth increases, limiting effective feature learning in complex architectures. ResNet addressed this through residual connections, but these fixed short-circuits cannot adapt to varying input complexity or selectively emphasize task relevant features across network hierarchies. This study introduces GradAttn, a hybrid CNN-transformer framework that replaces fixed residual connections with attention-controlled gradient flow. By extracting multi-scale CNN features at different depths and regulating them through self-attention, GradAttn dynamically weights shallow texture features and deep semantic representations. For representational analysis, we evaluated three GradAttn variants across eight diverse datasets, from natural images, medical imaging, to fashion recognition. Results demonstrate that GradAttn outperforms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.