Regularizing RNNs for Caption Generation by Reconstructing The Past with The Present
Xinpeng Chen, Lin Ma, Wenhao Jiang, Jian Yao, Wei Liu

TL;DR
This paper introduces ARNet, a novel RNN regularization method that reconstructs past hidden states from present ones, improving caption generation and modeling long-term dependencies.
Contribution
ARNet is a new architecture that regularizes RNNs by reconstructing past states, enhancing caption quality and long-term dependency modeling.
Findings
ARNet improves captioning performance on image and code datasets.
ARNet reduces training-inference discrepancy in caption generation.
ARNet effectively models long-term dependencies in RNNs.
Abstract
Recently, caption generation with an encoder-decoder framework has been extensively studied and applied in different domains, such as image captioning, code captioning, and so on. In this paper, we propose a novel architecture, namely Auto-Reconstructor Network (ARNet), which, coupling with the conventional encoder-decoder framework, works in an end-to-end fashion to generate captions. ARNet aims at reconstructing the previous hidden state with the present one, besides behaving as the input-dependent transition operator. Therefore, ARNet encourages the current hidden state to embed more information from the previous one, which can help regularize the transition dynamics of recurrent neural networks (RNNs). Extensive experimental results show that our proposed ARNet boosts the performance over the existing encoder-decoder models on both image captioning and source code captioning tasks.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning
