mEdIT: Multilingual Text Editing via Instruction Tuning

Vipul Raheja; Dimitris Alikaniotis; Vivek Kulkarni; Bashar; Alhafni; Dhruv Kumar

arXiv:2402.16472·cs.CL·April 18, 2024·2 cites

mEdIT: Multilingual Text Editing via Instruction Tuning

Vipul Raheja, Dimitris Alikaniotis, Vivek Kulkarni, Bashar, Alhafni, Dhruv Kumar

PDF

Open Access 1 Repo 2 Models 1 Datasets 1 Video

TL;DR

mEdIT is a multilingual text editing model that fine-tunes large pre-trained language models using instruction tuning, enabling effective text editing across multiple languages and tasks.

Contribution

The paper introduces mEdIT, a novel multilingual text editing model trained with instruction tuning on diverse datasets, improving cross-lingual generalization and performance.

Findings

01

Strong performance on multilingual text editing benchmarks

02

Effective generalization to new languages

03

Public release of data, code, and models

Abstract

We introduce mEdIT, a multi-lingual extension to CoEdIT -- the recent state-of-the-art text editing models for writing assistance. mEdIT models are trained by fine-tuning multi-lingual large, pre-trained language models (LLMs) via instruction tuning. They are designed to take instructions from the user specifying the attributes of the desired text in the form of natural language instructions, such as Grammatik korrigieren (German) or Parafrasee la oraci\'on (Spanish). We build mEdIT by curating data from multiple publicly available human-annotated text editing datasets for three text editing tasks (Grammatical Error Correction (GEC), Text Simplification, and Paraphrasing) across diverse languages belonging to six different language families. We detail the design and training of mEdIT models and demonstrate their strong performance on many multi-lingual text editing benchmarks against…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vipulraheja/medit
pytorchOfficial

Models

Datasets

grammarly/medit
dataset· 70 dl
70 dl

Videos

mEdIT: Multilingual Text Editing via Instruction Tuning· underline

Taxonomy

TopicsNatural Language Processing Techniques