Spivavtor: An Instruction Tuned Ukrainian Text Editing Model

Aman Saini; Artem Chernodub; Vipul Raheja; Vivek Kulkarni

arXiv:2404.18880·cs.CL·April 30, 2024·1 cites

Spivavtor: An Instruction Tuned Ukrainian Text Editing Model

Aman Saini, Artem Chernodub, Vipul Raheja, Vivek Kulkarni

PDF

Open Access 1 Datasets

TL;DR

Spivavtor is a new instruction-tuned Ukrainian text editing model, built upon the CoEdIT framework, demonstrating superior performance across multiple editing tasks and publicly released for community use.

Contribution

The paper introduces Spivavtor, a Ukrainian-focused adaptation of CoEdIT, including a new dataset and models, advancing Ukrainian NLP capabilities in text editing tasks.

Findings

01

Spivavtor outperforms existing models on Ukrainian text editing tasks.

02

The dataset and models are publicly available for research.

03

Spivavtor effectively handles GEC, simplification, coherence, and paraphrasing in Ukrainian.

Abstract

We introduce Spivavtor, a dataset, and instruction-tuned models for text editing focused on the Ukrainian language. Spivavtor is the Ukrainian-focused adaptation of the English-only CoEdIT model. Similar to CoEdIT, Spivavtor performs text editing tasks by following instructions in Ukrainian. This paper describes the details of the Spivavtor-Instruct dataset and Spivavtor models. We evaluate Spivavtor on a variety of text editing tasks in Ukrainian, such as Grammatical Error Correction (GEC), Text Simplification, Coherence, and Paraphrasing, and demonstrate its superior performance on all of them. We publicly release our best-performing models and data as resources to the community to advance further research in this space.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

andrian-kr/Social-Chemistry-101_care-harm
dataset· 2 dl
2 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Digital Humanities and Scholarship · Digital Rights Management and Security