SD-RSIC: Summarization Driven Deep Remote Sensing Image Captioning
Gencer Sumbul, Sonali Nayak, Beg\"um Demir

TL;DR
This paper introduces SD-RSIC, a novel remote sensing image captioning method that summarizes training captions to reduce redundancy and adaptively combines them with standard captions, improving captioning performance.
Contribution
The paper proposes a new summarization and adaptive weighting strategy for RS image captioning, addressing redundancy in training captions and enhancing model effectiveness.
Findings
Outperforms state-of-the-art methods on multiple datasets
Effectively reduces redundancy in training captions
Improves captioning accuracy and semantic relevance
Abstract
Deep neural networks (DNNs) have been recently found popular for image captioning problems in remote sensing (RS). Existing DNN based approaches rely on the availability of a training set made up of a high number of RS images with their captions. However, captions of training images may contain redundant information (they can be repetitive or semantically similar to each other), resulting in information deficiency while learning a mapping from the image domain to the language domain. To overcome this limitation, in this paper, we present a novel Summarization Driven Remote Sensing Image Captioning (SD-RSIC) approach. The proposed approach consists of three main steps. The first step obtains the standard image captions by jointly exploiting convolutional neural networks (CNNs) with long short-term memory (LSTM) networks. The second step, unlike the existing RS image captioning methods,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
