MsEdF: A Multi-stream Encoder-decoder Framework for Remote Sensing Image Captioning

Swadhin Das; Raksha Sharma

arXiv:2502.09282·cs.CV·May 1, 2026

MsEdF: A Multi-stream Encoder-decoder Framework for Remote Sensing Image Captioning

Swadhin Das, Raksha Sharma

PDF

TL;DR

This paper introduces MsEdF, a multi-stream encoder-decoder framework for remote sensing image captioning that enhances feature diversity and semantic modeling to improve descriptive accuracy.

Contribution

The novel multi-stream architecture fuses diverse spatial features and refines semantic context modeling, advancing RSIC performance over single-stream methods.

Findings

01

MsEdF outperforms baseline models on three benchmark datasets.

02

Fusing multiscale and structural cues enhances feature diversity.

03

Refined semantic modeling improves caption accuracy.

Abstract

Remote sensing images contain complex spatial patterns and semantic structures, which makes the captioning model difficult to accurately describe. Encoder-decoder architectures have become the widely used approach for RSIC by translating visual content into descriptive text. However, many existing methods rely on a single-stream architecture, which weakens the model to accurately describe the image. Such single-stream architectures typically struggle to extract diverse spatial features or capture complex semantic relationships, limiting their effectiveness in scenes with high intraclass similarity or contextual ambiguity. In this work, we propose a novel Multi-stream Encoder-decoder Framework (MsEdF) which improves the performance of RSIC by optimizing both the spatial representation and language generation of encoder-decoder architecture. The encoder fuses information from two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.