A TextGCN-Based Decoding Approach for Improving Remote Sensing Image Captioning
Swadhin Das, Raksha Sharma

TL;DR
This paper introduces a novel TextGCN-based encoder-decoder framework for automatic remote sensing image captioning, significantly improving caption quality over existing methods through semantic embeddings and a fair search strategy.
Contribution
The paper presents a new TextGCN-enhanced encoder-decoder model combined with a comparison-based beam search for improved remote sensing image captioning.
Findings
Outperforms state-of-the-art encoder-decoder models on three datasets.
Achieves higher scores across BLEU, METEOR, ROUGE-L, and CIDEr metrics.
Demonstrates the effectiveness of semantic embeddings in image captioning.
Abstract
Remote sensing images are highly valued for their ability to address complex real-world issues such as risk management, security, and meteorology. However, manually captioning these images is challenging and requires specialized knowledge across various domains. This letter presents an approach for automatically describing (captioning) remote sensing images. We propose a novel encoder-decoder setup that deploys a Text Graph Convolutional Network (TextGCN) and multi-layer LSTMs. The embeddings generated by TextGCN enhance the decoder's understanding by capturing the semantic relationships among words at both the sentence and corpus levels. Furthermore, we advance our approach with a comparison-based beam search method to ensure fairness in the search strategy for generating the final caption. We present an extensive evaluation of our approach against various other state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
