CTA-Net: A CNN-Transformer Aggregation Network for Improving Multi-Scale   Feature Extraction

Chunlei Meng; Jiacheng Yang; Wei Lin; Bowen Liu; Hongda Zhang; chun; ouyang; Zhongxue Gan

arXiv:2410.11428·cs.CV·October 16, 2024·3 cites

CTA-Net: A CNN-Transformer Aggregation Network for Improving Multi-Scale Feature Extraction

Chunlei Meng, Jiacheng Yang, Wei Lin, Bowen Liu, Hongda Zhang, chun, ouyang, Zhongxue Gan

PDF

Open Access

TL;DR

CTA-Net innovatively combines CNNs and vision transformers with a lightweight multi-scale feature fusion module, achieving superior accuracy and efficiency on small-scale datasets by effectively integrating local and global features.

Contribution

The paper introduces CTA-Net, a novel CNN-Transformer aggregation network with a new multi-scale feature fusion module and reverse reconstruction modules, enhancing efficiency and performance on small datasets.

Findings

01

Achieves 86.76% TOP-1 accuracy on small datasets.

02

Reduces parameters to 20.32 million.

03

Operates with 2.83 billion FLOPs, demonstrating efficiency.

Abstract

Convolutional neural networks (CNNs) and vision transformers (ViTs) have become essential in computer vision for local and global feature extraction. However, aggregating these architectures in existing methods often results in inefficiencies. To address this, the CNN-Transformer Aggregation Network (CTA-Net) was developed. CTA-Net combines CNNs and ViTs, with transformers capturing long-range dependencies and CNNs extracting localized features. This integration enables efficient processing of detailed local and broader contextual information. CTA-Net introduces the Light Weight Multi-Scale Feature Fusion Multi-Head Self-Attention (LMF-MHSA) module for effective multi-scale feature integration with reduced parameters. Additionally, the Reverse Reconstruction CNN-Variants (RRCV) module enhances the embedding of CNNs within the transformer architecture. Extensive experiments on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Image Processing and 3D Reconstruction