Swin Transformer coupling CNNs Makes Strong Contextual Encoders for VHR Image Road Extraction
Tao Chen, Yiran Liu, Haoyu Jiang, Ruirui Li

TL;DR
This paper introduces ConSwin, a dual-branch network combining ResNet and SwinTransformer for improved road extraction in very high resolution images, outperforming existing methods in accuracy and detail.
Contribution
The paper proposes a novel ConSwin block that integrates CNN and Transformer features, along with an hourglass network and new connection structures for enhanced road segmentation.
Findings
Outperforms state-of-the-art methods on multiple datasets.
Achieves higher accuracy, IOU, and F1 scores.
Demonstrates better texture and structural detail extraction.
Abstract
Accurately segmenting roads is challenging due to substantial intra-class variations, indistinct inter-class distinctions, and occlusions caused by shadows, trees, and buildings. To address these challenges, attention to important texture details and perception of global geometric contextual information are essential. Recent research has shown that CNN-Transformer hybrid structures outperform using CNN or Transformer alone. While CNN excels at extracting local detail features, the Transformer naturally perceives global contextual information. In this paper, we propose a dual-branch network block named ConSwin that combines ResNet and SwinTransformers for road extraction tasks. This ConSwin block harnesses the strengths of both approaches to better extract detailed and global features. Based on ConSwin, we construct an hourglass-shaped road extraction network and introduce two novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutomated Road and Building Extraction · Remote Sensing and LiDAR Applications · Advanced Neural Network Applications
MethodsMulti-Head Attention · Attention Is All You Need · *Communicated@Fast*How Do I Communicate to Expedia? · Linear Layer · Batch Normalization · 1x1 Convolution · Bottleneck Residual Block · Average Pooling · Global Average Pooling · Residual Block
