TL;DR
MuTSE is an interactive web tool that enables systematic, multi-dimensional evaluation of LLM-generated text simplifications across various prompts and models, aiding researchers and educators.
Contribution
It introduces MuTSE, a human-in-the-loop platform that visualizes and compares multiple prompt-model permutations for text simplification evaluation.
Findings
Supports concurrent execution of prompt-model permutations
Provides real-time comparison matrix for simplification outputs
Includes a semantic alignment engine with a linearity bias heuristic
Abstract
As Large Language Models (LLMs) become increasingly prevalent in text simplification, systematically evaluating their outputs across diverse prompting strategies and architectures remains a critical methodological challenge in both NLP research and Intelligent Tutoring Systems (ITS). Developing robust prompts is often hindered by the absence of structured, visual frameworks for comparative text analysis. While researchers typically rely on static computational scripts, educators are constrained to standard conversational interfaces -- neither paradigm supports systematic multi-dimensional evaluation of prompt-model permutations. To address these limitations, we introduce \textbf{MuTSE}\footnote{The project code and the demo have been made available for peer review at the following anonymized URL. https://osf.io/njs43/overview?view_only=4b4655789f484110a942ebb7788cdf2a, an interactive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
