New Encoder Learning for Captioning Heavy Rain Images via Semantic Visual Feature Matching

Chang-Hwan Son; Pung-Hwi Ye

arXiv:2105.13753·cs.CV·July 3, 2025

New Encoder Learning for Captioning Heavy Rain Images via Semantic Visual Feature Matching

Chang-Hwan Son, Pung-Hwi Ye

PDF

Open Access

TL;DR

This paper introduces a novel encoder that transforms heavy rain image features into semantic visual features, significantly improving captioning accuracy in adverse weather conditions by leveraging joint learning of reconstruction and feature matching.

Contribution

A new encoder architecture that converts heavy rain image features into semantic visual features, enhancing captioning performance under poor weather conditions.

Findings

01

Encoder effectively generates semantic features from heavy rain images.

02

Captioning accuracy improves significantly in rainy conditions.

03

End-to-end training of the encoder enhances robustness.

Abstract

Image captioning generates text that describes scenes from input images. It has been developed for high quality images taken in clear weather. However, in bad weather conditions, such as heavy rain, snow, and dense fog, the poor visibility owing to rain streaks, rain accumulation, and snowflakes causes a serious degradation of image quality. This hinders the extraction of useful visual features and results in deteriorated image captioning performance. To address practical issues, this study introduces a new encoder for captioning heavy rain images. The central idea is to transform output features extracted from heavy rain input images into semantic visual features associated with words and sentence context. To achieve this, a target encoder is initially trained in an encoder-decoder framework to associate visual features with semantic words. Subsequently, the objects in a heavy rain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Enhancement Techniques