CTA-Net: A CNN-Transformer Aggregation Network for Improving Multi-Scale Feature Extraction
Chunlei Meng, Jiacheng Yang, Wei Lin, Bowen Liu, Hongda Zhang, chun, ouyang, Zhongxue Gan

TL;DR
CTA-Net innovatively combines CNNs and vision transformers with a lightweight multi-scale feature fusion module, achieving superior accuracy and efficiency on small-scale datasets by effectively integrating local and global features.
Contribution
The paper introduces CTA-Net, a novel CNN-Transformer aggregation network with a new multi-scale feature fusion module and reverse reconstruction modules, enhancing efficiency and performance on small datasets.
Findings
Achieves 86.76% TOP-1 accuracy on small datasets.
Reduces parameters to 20.32 million.
Operates with 2.83 billion FLOPs, demonstrating efficiency.
Abstract
Convolutional neural networks (CNNs) and vision transformers (ViTs) have become essential in computer vision for local and global feature extraction. However, aggregating these architectures in existing methods often results in inefficiencies. To address this, the CNN-Transformer Aggregation Network (CTA-Net) was developed. CTA-Net combines CNNs and ViTs, with transformers capturing long-range dependencies and CNNs extracting localized features. This integration enables efficient processing of detailed local and broader contextual information. CTA-Net introduces the Light Weight Multi-Scale Feature Fusion Multi-Head Self-Attention (LMF-MHSA) module for effective multi-scale feature integration with reduced parameters. Additionally, the Reverse Reconstruction CNN-Variants (RRCV) module enhances the embedding of CNNs within the transformer architecture. Extensive experiments on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Image Processing and 3D Reconstruction
