GradAttn: Replacing Fixed Residual Connections with Task-Modulated Attention Pathways
Soudeep Ghoshal, Himanshu Buckchash

TL;DR
GradAttn introduces a hybrid CNN-transformer model that replaces fixed residuals with attention-controlled gradient flow, enhancing feature learning and accuracy across diverse datasets.
Contribution
It proposes a novel framework that dynamically modulates gradient flow using attention, outperforming traditional residual connections in deep CNNs.
Findings
GradAttn outperforms ResNet-18 on five of eight datasets.
Achieves up to +11.07% accuracy improvement on FashionMNIST.
Attention-controlled gradient flow can improve generalization despite controlled instabilities.
Abstract
Deep ConvNets suffer from gradient signal degradation as network depth increases, limiting effective feature learning in complex architectures. ResNet addressed this through residual connections, but these fixed short-circuits cannot adapt to varying input complexity or selectively emphasize task relevant features across network hierarchies. This study introduces GradAttn, a hybrid CNN-transformer framework that replaces fixed residual connections with attention-controlled gradient flow. By extracting multi-scale CNN features at different depths and regulating them through self-attention, GradAttn dynamically weights shallow texture features and deep semantic representations. For representational analysis, we evaluated three GradAttn variants across eight diverse datasets, from natural images, medical imaging, to fashion recognition. Results demonstrate that GradAttn outperforms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
