Evaluation and Analysis of Deep Neural Transformers and Convolutional Neural Networks on Modern Remote Sensing Datasets

J. Alex Hurt; Trevor M. Bajkowski; Grant J. Scott; Curt H. Davis

arXiv:2508.02871·cs.CV·August 6, 2025

Evaluation and Analysis of Deep Neural Transformers and Convolutional Neural Networks on Modern Remote Sensing Datasets

J. Alex Hurt, Trevor M. Bajkowski, Grant J. Scott, Curt H. Davis

PDF

TL;DR

This paper compares transformer-based and convolutional neural networks for object detection in high-resolution satellite imagery, demonstrating state-of-the-art results and analyzing their performance across multiple datasets.

Contribution

It provides a comprehensive large-scale comparison of transformer and CNN architectures on remote sensing datasets, highlighting their relative strengths and performance differences.

Findings

01

Transformers achieve state-of-the-art detection performance.

02

CNNs remain competitive in certain scenarios.

03

Performance varies with dataset complexity and feature extraction methods.

Abstract

In 2012, AlexNet established deep convolutional neural networks (DCNNs) as the state-of-the-art in CV, as these networks soon led in visual tasks for many domains, including remote sensing. With the publication of Visual Transformers, we are witnessing the second modern leap in computational vision, and as such, it is imperative to understand how various transformer-based neural networks perform on satellite imagery. While transformers have shown high levels of performance in natural language processing and CV applications, they have yet to be compared on a large scale to modern remote sensing data. In this paper, we explore the use of transformer-based neural networks for object detection in high-resolution electro-optical satellite imagery, demonstrating state-of-the-art performance on a variety of publicly available benchmark data sets. We compare eleven distinct bounding-box…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.