SceneMixer: Exploring Convolutional Mixing Networks for Remote Sensing Scene Classification
Mohammed Q. Alkhatib, Ali Jamali, Swalpa Kumar Roy

TL;DR
SceneMixer introduces a lightweight convolutional mixer network for remote sensing scene classification, effectively balancing accuracy and computational efficiency across challenging aerial and satellite imagery datasets.
Contribution
The paper presents a novel convolutional mixer architecture tailored for remote sensing, combining multi-scale spatial and channel mixing to improve classification performance.
Findings
Achieved 74.7% accuracy on AID dataset
Attained 93.9% accuracy on EuroSAT dataset
Demonstrated competitive efficiency compared to CNNs and ViTs
Abstract
Remote sensing scene classification plays a key role in Earth observation by enabling the automatic identification of land use and land cover (LULC) patterns from aerial and satellite imagery. Despite recent progress with convolutional neural networks (CNNs) and vision transformers (ViTs), the task remains challenging due to variations in spatial resolution, viewpoint, orientation, and background conditions, which often reduce the generalization ability of existing models. To address these challenges, this paper proposes a lightweight architecture based on the convolutional mixer paradigm. The model alternates between spatial mixing through depthwise convolutions at multiple scales and channel mixing through pointwise operations, enabling efficient extraction of both local and contextual information while keeping the number of parameters and computations low. Extensive experiments were…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote-Sensing Image Classification · Remote Sensing in Agriculture · Automated Road and Building Extraction
