Generating Text with Deep Reinforcement Learning
Hongyu Guo

TL;DR
This paper presents a novel sequence-to-sequence decoding method using deep reinforcement learning with a Deep Q-Network, which iteratively improves generated sequences by focusing on difficult parts, outperforming traditional decoders on unseen data.
Contribution
Introduces a reinforcement learning-based decoding schema for sequence generation that iteratively refines output sequences, emphasizing difficult parts, and demonstrates improved performance on unseen sentences.
Findings
Outperforms baseline on unseen sentences in BLEU score
Effectively focuses on difficult sequence parts during decoding
Achieves competitive results on training data
Abstract
We introduce a novel schema for sequence to sequence learning with a Deep Q-Network (DQN), which decodes the output sequence iteratively. The aim here is to enable the decoder to first tackle easier portions of the sequences, and then turn to cope with difficult parts. Specifically, in each iteration, an encoder-decoder Long Short-Term Memory (LSTM) network is employed to, from the input sequence, automatically create features to represent the internal states of and formulate a list of potential actions for the DQN. Take rephrasing a natural sentence as an example. This list can contain ranked potential words. Next, the DQN learns to make decision on which action (e.g., word) will be selected from the list to modify the current decoded sequence. The newly modified output sequence is subsequently used as the input to the DQN for the next decoding iteration. In each iteration, we also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsSigmoid Activation · Tanh Activation · Q-Learning · Dense Connections · Convolution · Deep Q-Network · Long Short-Term Memory
