RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting

Lei Shu; Liangchen Luo; Jayakumar Hoskere; Yun Zhu; Yinxiao Liu; Simon; Tong; Jindong Chen; Lei Meng

arXiv:2305.15685·cs.CL·December 21, 2023·5 cites

RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting

Lei Shu, Liangchen Luo, Jayakumar Hoskere, Yun Zhu, Yinxiao Liu, Simon, Tong, Jindong Chen, Lei Meng

PDF

Open Access 1 Repo 1 Datasets

TL;DR

RewriteLM is a large language model fine-tuned with instruction and reinforcement learning techniques to excel at diverse cross-sentence text rewriting tasks, supported by a new benchmark and data generation methods.

Contribution

The paper introduces new instruction tuning and reinforcement learning strategies for LLMs to improve cross-sentence rewriting capabilities, along with a novel benchmark dataset.

Findings

01

Significant performance improvements over baselines

02

Effective data generation from Wiki edits and public corpora

03

Introduction of OpenRewriteEval benchmark

Abstract

Large Language Models (LLMs) have demonstrated impressive capabilities in creative tasks such as storytelling and E-mail generation. However, as LLMs are primarily trained on final text results rather than intermediate revisions, it might be challenging for them to perform text rewriting tasks. Most studies in the rewriting tasks focus on a particular transformation type within the boundaries of single sentences. In this work, we develop new strategies for instruction tuning and reinforcement learning to better align LLMs for cross-sentence rewriting tasks using diverse wording and structures expressed through natural languages including 1) generating rewriting instruction data from Wiki edits and public corpus through instruction generation and chain-of-thought prompting; 2) collecting comparison data for reward model training through a new ranking function. To facilitate this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google-research/google-research
tfOfficial

Datasets

gabrielmbmb/OpenRewriteEval
dataset· 9 dl
9 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsALIGN · Focus