Off-Policy Self-Critical Training for Transformer in Visual Paragraph   Generation

Shiyang Yan; Yang Hua; Neil M. Robertson

arXiv:2006.11714·cs.CV·June 23, 2020

Off-Policy Self-Critical Training for Transformer in Visual Paragraph Generation

Shiyang Yan, Yang Hua, Neil M. Robertson

PDF

Open Access

TL;DR

This paper introduces an off-policy reinforcement learning method using self-critical training and TRIS to improve Transformer-based visual paragraph generation, reducing variance and enhancing performance.

Contribution

It proposes a novel off-policy RL algorithm with TRIS and KL-control for Transformer models, enabling efficient training for visual paragraph generation.

Findings

01

Achieved state-of-the-art results on visual paragraph generation

02

Improved image captioning performance

03

Reduced variance in importance sampling

Abstract

Recently, several approaches have been proposed to solve language generation problems. Transformer is currently state-of-the-art seq-to-seq model in language generation. Reinforcement Learning (RL) is useful in solving exposure bias and the optimisation on non-differentiable metrics in seq-to-seq language learning. However, Transformer is hard to combine with RL as the costly computing resource is required for sampling. We tackle this problem by proposing an off-policy RL learning algorithm where a behaviour policy represented by GRUs performs the sampling. We reduce the high variance of importance sampling (IS) by applying the truncated relative importance sampling (TRIS) technique and Kullback-Leibler (KL)-control concept. TRIS is a simple yet effective technique, and there is a theoretical proof that KL-control helps to reduce the variance of IS. We formulate this off-policy RL based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Label Smoothing · Multi-Head Attention · Adam · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Byte Pair Encoding