RewardsOfSum: Exploring Reinforcement Learning Rewards for Summarisation
Jacob Parnell, Inigo Jauregi Unanue, Massimo Piccardi

TL;DR
This paper investigates novel reinforcement learning reward functions for abstractive summarisation, demonstrating that these can improve performance over traditional NLL training across diverse datasets.
Contribution
Introduces two new reward functions, RwB-Hinge and RISK, for reinforcement learning in summarisation, and empirically evaluates their effectiveness.
Findings
Both reward functions outperform NLL baselines.
Consistent improvements across nine datasets.
Reinforcement learning rewards significantly impact summarisation quality.
Abstract
To date, most abstractive summarisation models have relied on variants of the negative log-likelihood (NLL) as their training objective. In some cases, reinforcement learning has been added to train the models with an objective that is closer to their evaluation measures (e.g. ROUGE). However, the reward function to be used within the reinforcement learning approach can play a key role for performance and is still partially unexplored. For this reason, in this paper, we propose two reward functions for the task of abstractive summarisation: the first function, referred to as RwB-Hinge, dynamically selects the samples for the gradient update. The second function, nicknamed RISK, leverages a small pool of strong candidates to inform the reward. In the experiments, we probe the proposed approach by fine-tuning an NLL pre trained model over nine summarisation datasets of diverse size and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
