Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control
Natasha Jaques, Shixiang Gu, Dzmitry Bahdanau, Jos\'e Miguel, Hern\'andez-Lobato, Richard E. Turner, Douglas Eck

TL;DR
This paper introduces Sequence Tutor, a method combining pre-trained RNNs with reinforcement learning via KL-control to enhance sequence quality and structure while preserving learned information, demonstrated on music and molecular generation.
Contribution
It presents a novel off-policy RL approach using KL-control for fine-tuning RNNs, improving sequence quality without losing original data information.
Findings
Improved sequence structure and quality in music and molecular generation
Maintained diversity and learned information from data
Demonstrated effectiveness on two distinct applications
Abstract
This paper proposes a general method for improving the structure and quality of sequences generated by a recurrent neural network (RNN), while maintaining information originally learned from data, as well as sample diversity. An RNN is first pre-trained on data using maximum likelihood estimation (MLE), and the probability distribution over the next token in the sequence learned by this model is treated as a prior policy. Another RNN is then trained using reinforcement learning (RL) to generate higher-quality outputs that account for domain-specific incentives while retaining proximity to the prior policy of the MLE RNN. To formalize this objective, we derive novel off-policy RL methods for RNNs from KL-control. The effectiveness of the approach is demonstrated on two applications; 1) generating novel musical melodies, and 2) computational molecular generation. For both problems, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Neural Networks and Reservoir Computing
