Semantic Labeling of High Resolution Images Using EfficientUNets and Transformers
Hasan AlMarzouqi, Lyes Saad Saoud

TL;DR
This paper introduces a novel segmentation model combining CNNs and transformers for high-resolution remote sensing images, enhancing accuracy by integrating multi-modal data and multi-task strategies.
Contribution
It proposes a new hybrid CNN-transformer model with fusion layers for multi-modal input and multi-task segmentation, improving remote sensing image analysis.
Findings
Improved segmentation accuracy over state-of-the-art methods
Effective multi-modal data integration via fusion layers
Enhanced global and local feature learning
Abstract
Semantic segmentation necessitates approaches that learn high-level characteristics while dealing with enormous amounts of data. Convolutional neural networks (CNNs) can learn unique and adaptive features to achieve this aim. However, due to the large size and high spatial resolution of remote sensing images, these networks cannot analyze an entire scene efficiently. Recently, deep transformers have proven their capability to record global interactions between different objects in the image. In this paper, we propose a new segmentation model that combines convolutional neural networks with transformers, and show that this mixture of local and global feature extraction techniques provides significant advantages in remote sensing segmentation. In addition, the proposed model includes two fusion layers that are designed to represent multi-modal inputs and output of the network efficiently.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Remote-Sensing Image Classification · Advanced Neural Network Applications
