GROOT: Corrective Reward Optimization for Generative Sequential Labeling
Kazuma Hashimoto, Karthik Raman

TL;DR
GROOT introduces a framework that aligns generative sequence models with specific reward metrics through iterative correction and contrast, improving performance on various benchmarks.
Contribution
It presents a novel reward optimization method for generative sequential labeling that better aligns training objectives with practical reward metrics.
Findings
Significant improvements in reward metrics across four benchmarks
Enhanced quality of top-k candidate sequences
Effective correction and contrastive training regime
Abstract
Sequential labeling is a fundamental NLP task, forming the backbone of many applications. Supervised learning of Seq2Seq models has shown great success on these problems. However, the training objectives are still significantly disconnected with the metrics and desiderata we care about in practice. For example, a practical sequence tagging application may want to optimize for a certain precision-recall trade-off (of the top-k predictions) which is quite different from the standard objective of maximizing the likelihood of the gold labeled sequence. Thus to bridge this gap, we propose GROOT -- a simple yet effective framework for Generative Reward Optimization Of Text sequences. GROOT works by training a generative sequential labeling model to match the decoder output distribution with that of the (black-box) reward function. Using an iterative training regime, we first generate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Natural Language Processing Techniques · Machine Learning and Data Classification
MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory · Sequence to Sequence
