Efficient Reinforcement Learning for Unsupervised Controlled Text   Generation

Bhargav Upadhyay; Akhilesh Sudhakar; Arjun Maheswaran

arXiv:2204.07696·cs.CL·April 19, 2022·1 cites

Efficient Reinforcement Learning for Unsupervised Controlled Text Generation

Bhargav Upadhyay, Akhilesh Sudhakar, Arjun Maheswaran

PDF

Open Access

TL;DR

This paper introduces a dense reward strategy for reinforcement learning in unsupervised controlled text generation, significantly improving efficiency and quality over existing methods.

Contribution

It proposes a novel dense reward approach for RL in text generation, enhancing training efficiency and transfer quality compared to reward shaping techniques.

Findings

01

21% improvement in human evaluation of style transfer

02

12% improvement in automatic evaluation metrics

03

2.5x increase in sample efficiency and 7x faster training

Abstract

Controlled text generation tasks such as unsupervised text style transfer have increasingly adopted the use of Reinforcement Learning (RL). A major challenge in applying RL to such tasks is the sparse reward, which is available only after the full text is generated. Sparse rewards, combined with a large action space make RL training sample-inefficient and difficult to converge. Recently proposed reward-shaping strategies to address this issue have shown only negligible gains. In contrast, this work proposes a novel approach that provides dense rewards to each generated token. We evaluate our approach by its usage in unsupervised text style transfer. Averaged across datasets, our style transfer system improves upon current state-of-art systems by 21\% on human evaluation and 12\% on automatic evaluation. Upon ablated comparison with the current reward shaping approach (the `roll-out…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Music and Audio Processing · Artificial Intelligence in Games