Toward Diverse Text Generation with Inverse Reinforcement Learning
Zhan Shi, Xinchi Chen, Xipeng Qiu, Xuanjing Huang

TL;DR
This paper introduces an inverse reinforcement learning approach to text generation, addressing reward sparsity and mode collapse, resulting in more diverse and higher quality generated texts.
Contribution
It applies inverse reinforcement learning to NLP, enabling dense reward signals and promoting diversity in generated texts, which is a novel approach in this domain.
Findings
Generated texts are of higher quality than previous methods.
The method encourages more diverse text generation.
Reward signals are denser, improving training stability.
Abstract
Text generation is a crucial task in NLP. Recently, several adversarial generative models have been proposed to improve the exposure bias problem in text generation. Though these models gain great success, they still suffer from the problems of reward sparsity and mode collapse. In order to address these two problems, in this paper, we employ inverse reinforcement learning (IRL) for text generation. Specifically, the IRL framework learns a reward function on training data, and then an optimal policy to maximum the expected total reward. Similar to the adversarial models, the reward and policy function in IRL are optimized alternately. Our method has two advantages: (1) the reward function can produce more dense reward signals. (2) the generation policy, trained by "entropy regularized" policy gradient, encourages to generate more diversified texts. Experiment results demonstrate that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
