Recurrent Relational Memory Network for Unsupervised Image Captioning
Dan Guo, Yang Wang, Peipei Song, Meng Wang

TL;DR
This paper introduces R^2M, a memory-based network for unsupervised image captioning that avoids GANs, uses a two-stage memory mechanism for relational reasoning, and outperforms existing methods on benchmarks.
Contribution
The paper presents a novel R^2M network that replaces GANs with a memory-based approach for unsupervised image captioning, improving efficiency and performance.
Findings
R^2M outperforms state-of-the-art methods on benchmark datasets.
The proposed model has fewer parameters and higher computational efficiency.
It effectively encodes visual context and learns from textual corpora without supervision.
Abstract
Unsupervised image captioning with no annotations is an emerging challenge in computer vision, where the existing arts usually adopt GAN (Generative Adversarial Networks) models. In this paper, we propose a novel memory-based network rather than GAN, named Recurrent Relational Memory Network (). Unlike complicated and sensitive adversarial learning that non-ideally performs for long sentence generation, implements a concepts-to-sentence memory translator through two-stage memory mechanisms: fusion and recurrent memories, correlating the relational reasoning between common visual concepts and the generated words for long periods. encodes visual context through unsupervised training on images, while enabling the memory to learn from irrelevant textual corpus via supervised fashion. Our solution enjoys less learnable parameters and higher computational efficiency than…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning
MethodsMemory Network
