Onboard Satellite Image Classification for Earth Observation: A Comparative Study of ViT Models
Thanh-Dung Le, Vu Nguyen Ha, Ti Ti Nguyen, Geoffrey Eappen, Prabhu, Thiruvasagam, Hong-fu Chou, Duc-Dung Tran, Hung Nguyen-Kha, Luis M., Garces-Socarras, Jorge L. Gonzalez-Rios, Juan Carlos Merlano-Duncan, Symeon, Chatzinotas

TL;DR
This paper compares various pre-trained vision Transformer models for onboard satellite land use classification, finding EfficientViT-M2 to be the most accurate, efficient, and robust model suitable for satellite Earth observation tasks.
Contribution
It provides a comprehensive comparison of ViT models for onboard satellite image classification, highlighting EfficientViT-M2 as the optimal choice for accuracy and energy efficiency.
Findings
EfficientViT-M2 achieves 98.76% accuracy, precision, and recall.
EfficientViT-M2 reduces power consumption by over 63% compared to other models.
Pre-trained ViT models outperform traditional CNN and ResNet models in this context.
Abstract
This study focuses on identifying the most effective pre-trained model for land use classification in onboard satellite processing, emphasizing achieving high accuracy, computational efficiency, and robustness against noisy data conditions commonly encountered during satellite-based inference. Through extensive experimentation, we compare the performance of traditional CNN-based, ResNet-based, and various pre-trained vision Transformer models. Our findings demonstrate that pre-trained Vision Transformer (ViT) models, particularly MobileViTV2 and EfficientViT-M2, outperform models trained from scratch in terms of accuracy and efficiency. These models achieve high performance with reduced computational requirements and exhibit greater resilience during inference under noisy conditions. While MobileViTV2 has excelled on clean validation data, EfficientViT-M2 has proved more robust when…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote Sensing and Land Use · Remote-Sensing Image Classification · Advanced Computational Techniques and Applications
MethodsAttention Is All You Need · Byte Pair Encoding · Absolute Position Encodings · Vision Transformer · Softmax · Label Smoothing · Dropout · Layer Normalization · Position-Wise Feed-Forward Layer · Linear Layer
