Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning
Yunbin Tu, Liang Li, Li Su, Chenggang Yan, Qingming Huang

TL;DR
This paper introduces a distractors-immune representation learning approach with cross-modal contrastive regularization for change captioning, improving robustness to distractors like illumination and viewpoint changes.
Contribution
It proposes a novel distractors-immune representation learning network combined with cross-modal contrastive regularization, enhancing change captioning accuracy under distractors.
Findings
Outperforms state-of-the-art on four datasets
Produces more stable representations under distractors
Improves caption relevance to true semantic changes
Abstract
Change captioning aims to succinctly describe the semantic change between a pair of similar images, while being immune to distractors (illumination and viewpoint changes). Under these distractors, unchanged objects often appear pseudo changes about location and scale, and certain objects might overlap others, resulting in perturbational and discrimination-degraded features between two images. However, most existing methods directly capture the difference between them, which risk obtaining error-prone difference features. In this paper, we propose a distractors-immune representation learning network that correlates the corresponding channels of two image representations and decorrelates different ones in a self-supervised manner, thus attaining a pair of stable image representations under distractors. Then, the model can better interact them to capture the reliable difference features…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicsvaccines and immunoinformatics approaches · Video Analysis and Summarization
