Regularizing RNNs for Caption Generation by Reconstructing The Past with   The Present

Xinpeng Chen; Lin Ma; Wenhao Jiang; Jian Yao; Wei Liu

arXiv:1803.11439·cs.CV·April 10, 2018·27 cites

Regularizing RNNs for Caption Generation by Reconstructing The Past with The Present

Xinpeng Chen, Lin Ma, Wenhao Jiang, Jian Yao, Wei Liu

PDF

Open Access 1 Repo

TL;DR

This paper introduces ARNet, a novel RNN regularization method that reconstructs past hidden states from present ones, improving caption generation and modeling long-term dependencies.

Contribution

ARNet is a new architecture that regularizes RNNs by reconstructing past states, enhancing caption quality and long-term dependency modeling.

Findings

01

ARNet improves captioning performance on image and code datasets.

02

ARNet reduces training-inference discrepancy in caption generation.

03

ARNet effectively models long-term dependencies in RNNs.

Abstract

Recently, caption generation with an encoder-decoder framework has been extensively studied and applied in different domains, such as image captioning, code captioning, and so on. In this paper, we propose a novel architecture, namely Auto-Reconstructor Network (ARNet), which, coupling with the conventional encoder-decoder framework, works in an end-to-end fashion to generate captions. ARNet aims at reconstructing the previous hidden state with the present one, besides behaving as the input-dependent transition operator. Therefore, ARNet encourages the current hidden state to embed more information from the previous one, which can help regularize the transition dynamics of recurrent neural networks (RNNs). Extensive experimental results show that our proposed ARNet boosts the performance over the existing encoder-decoder models on both image captioning and source code captioning tasks.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chenxinpeng/ARNet
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning