Taming CATS: Controllable Automatic Text Simplification through Instruction Fine-Tuning with Control Tokens

Hanna Hubarava; Yingqiang Gao

arXiv:2604.01779·cs.CL·April 3, 2026

Taming CATS: Controllable Automatic Text Simplification through Instruction Fine-Tuning with Control Tokens

Hanna Hubarava, Yingqiang Gao

PDF

TL;DR

This paper introduces a domain-agnostic framework for controllable automatic text simplification using instruction fine-tuning with control tokens, improving control over readability and compression across various models and domains.

Contribution

It proposes a novel instruction fine-tuning approach with control tokens for better controllability in text simplification, addressing data and evaluation limitations.

Findings

01

Smaller models (1-3B) can be competitive in controllability.

02

Readability control is reliably learned, but compression control underperforms.

03

Standard metrics are insufficient; error-based measures are proposed for better evaluation.

Abstract

Controllable Automatic Text Simplification (CATS) produces user-tailored outputs, yet controllability is often treated as a decoding problem and evaluated with metrics that are not reflective to the measure of control. We observe that controllability in ATS is significantly constrained by data and evaluation. To this end, we introduce a domain-agnostic CATS framework based on instruction fine-tuning with discrete control tokens, steering open-source models to target readability levels and compression rates. Across three model families with different model sizes (Llama, Mistral, Qwen; 1-14B) and four domains (medicine, public administration, news, encyclopedic text), we find that smaller models (1-3B) can be competitive, but reliable controllability strongly depends on whether the training data encodes sufficient variation in the target attribute. Readability control (FKGL, ARI,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.