Improving Axial-Attention Network Classification via Cross-Channel Weight Sharing
Nazmul Shahadat, Anthony S. Maida

TL;DR
This paper explores replacing layers in Axial Attention networks with hypercomplex-inspired variants, leading to improved image classification accuracy on ImageNet300k across multiple network architectures.
Contribution
It introduces a method of integrating hypercomplex-inspired layers into Axial Attention networks, demonstrating consistent accuracy improvements across various network components.
Findings
Improved accuracy on ImageNet300k with various modifications
Hypercomplex variants enhance representational coherence
Technique is broadly applicable to different network parts
Abstract
In recent years, hypercomplex-inspired neural networks (HCNNs) have been used to improve deep learning architectures due to their ability to enable channel-based weight sharing, treat colors as a single entity, and improve representational coherence within the layers. The work described herein studies the effect of replacing existing layers in an Axial Attention network with their representationally coherent variants to assess the effect on image classification. We experiment with the stem of the network, the bottleneck layers, and the fully connected backend, by replacing them with representationally coherent variants. These various modifications lead to novel architectures which all yield improved accuracy performance on the ImageNet300k classification dataset. Our baseline networks for comparison were the original real-valued ResNet, the original quaternion-valued ResNet, and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing Techniques and Applications · Cell Image Analysis Techniques · Neural Networks and Reservoir Computing
