Flesch or Fumble? Evaluating Readability Standard Alignment of   Instruction-Tuned Language Models

Joseph Marvin Imperial; Harish Tayyar Madabushi

arXiv:2309.05454·cs.CL·November 7, 2023·5 cites

Flesch or Fumble? Evaluating Readability Standard Alignment of Instruction-Tuned Language Models

Joseph Marvin Imperial, Harish Tayyar Madabushi

PDF

Open Access 1 Repo

TL;DR

This paper evaluates how well instruction-tuned language models align with readability standards like FKGL and CEFR when performing educational tasks, revealing differences in effectiveness among models.

Contribution

It provides an empirical comparison of various instruction-tuned models' ability to generate readable educational content using standard readability metrics.

Findings

01

ChatGPT is less effective without refined prompts.

02

Open-source models like BLOOMZ and FlanT5 perform better.

03

Models vary significantly in readability standard alignment.

Abstract

Readability metrics and standards such as Flesch Kincaid Grade Level (FKGL) and the Common European Framework of Reference for Languages (CEFR) exist to guide teachers and educators to properly assess the complexity of educational materials before administering them for classroom use. In this study, we select a diverse set of open and closed-source instruction-tuned language models and investigate their performances in writing story completions and simplifying narratives--tasks that teachers perform--using standard-guided prompts controlling text readability. Our extensive findings provide empirical proof of how globally recognized models like ChatGPT may be considered less effective and may require more refined prompts for these generative tasks compared to other open-sourced models such as BLOOMZ and FlanT5--which have shown promising results.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

imperialite/readability-standard-alignment
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification · Natural Language Processing Techniques

MethodsBLOOMZ