Cascaded Multi-Scale Attention for Enhanced Multi-Scale Feature Extraction and Interaction with Low-Resolution Images
Xiangyong Lu, Masanori Suganuma, Takayuki Okatani

TL;DR
This paper introduces a cascaded multi-scale attention mechanism designed for CNN-ViT hybrid models to improve feature extraction from low-resolution images, enhancing tasks like pose estimation with fewer parameters.
Contribution
The paper presents a novel CMSA module that effectively integrates multi-scale features without input downsampling, improving low-resolution image analysis in CNN-ViT architectures.
Findings
Outperforms state-of-the-art methods in pose estimation tasks
Uses fewer parameters than existing approaches
Enhances feature extraction from low-resolution images
Abstract
In real-world applications of image recognition tasks, such as human pose estimation, cameras often capture objects, like human bodies, at low resolutions. This scenario poses a challenge in extracting and leveraging multi-scale features, which is often essential for precise inference. To address this challenge, we propose a new attention mechanism, named cascaded multi-scale attention (CMSA), tailored for use in CNN-ViT hybrid architectures, to handle low-resolution inputs effectively. The design of CMSA enables the extraction and seamless integration of features across various scales without necessitating the downsampling of the input image or feature maps. This is achieved through a novel combination of grouped multi-head self-attention mechanisms with window-based local attention and cascaded fusion of multi-scale features over different scales. This architecture allows for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Image Processing Techniques and Applications
MethodsSoftmax · Attention Is All You Need
