GROOT: Corrective Reward Optimization for Generative Sequential Labeling

Kazuma Hashimoto; Karthik Raman

arXiv:2209.14694·cs.CL·December 22, 2022

GROOT: Corrective Reward Optimization for Generative Sequential Labeling

Kazuma Hashimoto, Karthik Raman

PDF

Open Access

TL;DR

GROOT introduces a framework that aligns generative sequence models with specific reward metrics through iterative correction and contrast, improving performance on various benchmarks.

Contribution

It presents a novel reward optimization method for generative sequential labeling that better aligns training objectives with practical reward metrics.

Findings

01

Significant improvements in reward metrics across four benchmarks

02

Enhanced quality of top-k candidate sequences

03

Effective correction and contrastive training regime

Abstract

Sequential labeling is a fundamental NLP task, forming the backbone of many applications. Supervised learning of Seq2Seq models has shown great success on these problems. However, the training objectives are still significantly disconnected with the metrics and desiderata we care about in practice. For example, a practical sequence tagging application may want to optimize for a certain precision-recall trade-off (of the top-k predictions) which is quite different from the standard objective of maximizing the likelihood of the gold labeled sequence. Thus to bridge this gap, we propose GROOT -- a simple yet effective framework for Generative Reward Optimization Of Text sequences. GROOT works by training a generative sequential labeling model to match the decoder output distribution with that of the (black-box) reward function. Using an iterative training regime, we first generate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Visualization and Analytics · Natural Language Processing Techniques · Machine Learning and Data Classification

MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory · Sequence to Sequence