Heuristical Comparison of Vision Transformers Against Convolutional Neural Networks for Semantic Segmentation on Remote Sensing Imagery
Ashim Dahal, Saydul Akbar Murad, and Nick Rahimi

TL;DR
This paper compares Vision Transformers and CNNs for semantic segmentation of remote sensing images, highlighting the impact of a weighted loss function and transfer learning on model performance and efficiency.
Contribution
It introduces a heuristic analysis of ViT versus CNN models, emphasizing the effects of a weighted loss function and transfer learning on segmentation accuracy.
Findings
Weighted fused loss improves CNN performance significantly.
CNN with weighted loss outperforms ViT in segmentation metrics.
Trade-offs identified between model accuracy and inference time.
Abstract
Vision Transformers (ViT) have recently brought a new wave of research in the field of computer vision. These models have performed particularly well in image classification and segmentation. Research on semantic and instance segmentation has accelerated with the introduction of the new architecture, with over 80% of the top 20 benchmarks for the iSAID dataset based on either the ViT architecture or the attention mechanism behind its success. This paper focuses on the heuristic comparison of three key factors of using (or not using) ViT for semantic segmentation of remote sensing aerial images on the iSAID dataset. The experimental results observed during this research were analyzed based on three objectives. First, we studied the use of a weighted fused loss function to maximize the mean Intersection over Union (mIoU) score and Dice score while minimizing entropy or class…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote-Sensing Image Classification
MethodsSoftmax · Attention Is All You Need
