Interactive Masked Image Modeling for Multimodal Object Detection in Remote Sensing
Minh-Duc Vu, Zuheng Ming, Fangchen Feng, Bissmella Bahaduri, Anissa, Mokraoui

TL;DR
This paper introduces an interactive masked image modeling approach that leverages self-supervised learning to improve multimodal object detection in remote sensing imagery, especially for small and barely visible objects.
Contribution
The paper proposes a novel interactive MIM method that enhances token interactions, addressing limitations of conventional MIM for fine-grained remote sensing object detection.
Findings
Improved detection accuracy on remote sensing datasets.
Effective use of self-supervised pre-training with unlabeled data.
Demonstrated superiority over traditional MIM methods.
Abstract
Object detection in remote sensing imagery plays a vital role in various Earth observation applications. However, unlike object detection in natural scene images, this task is particularly challenging due to the abundance of small, often barely visible objects across diverse terrains. To address these challenges, multimodal learning can be used to integrate features from different data modalities, thereby improving detection accuracy. Nonetheless, the performance of multimodal learning is often constrained by the limited size of labeled datasets. In this paper, we propose to use Masked Image Modeling (MIM) as a pre-training technique, leveraging self-supervised learning on unlabeled data to enhance detection performance. However, conventional MIM such as MAE which uses masked tokens without any contextual information, struggles to capture the fine-grained details due to a lack of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSatellite Image Processing and Photogrammetry · Remote Sensing and Land Use · Geographic Information Systems Studies
MethodsMutual Information Machine/Mask Image Modeling · Masked autoencoder
