Textual Aesthetics in Large Language Models

Lingjie Jiang; Shaohan Huang; Xun Wu; Furu Wei

arXiv:2411.02930·cs.CL·November 6, 2024

Textual Aesthetics in Large Language Models

Lingjie Jiang, Shaohan Huang, Xun Wu, Furu Wei

PDF

Open Access 1 Repo 1 Datasets 1 Video 3 Reviews

TL;DR

This paper introduces a new approach for improving textual aesthetics in large language models through a dedicated dataset, a fine-tuning method called TAPO, and evaluation techniques, leading to better aesthetic and overall performance.

Contribution

It presents a novel pipeline, TexAes dataset, and TAPO fine-tuning method to enhance textual aesthetics in LLMs without sacrificing content correctness.

Findings

01

Textual aesthetics can be quantitatively improved in LLMs.

02

TAPO fine-tuning enhances aesthetic scores and general performance.

03

Evaluation methods effectively measure textual aesthetics.

Abstract

Image aesthetics is a crucial metric in the field of image generation. However, textual aesthetics has not been sufficiently explored. With the widespread application of large language models (LLMs), previous work has primarily focused on the correctness of content and the helpfulness of responses. Nonetheless, providing responses with textual aesthetics is also an important factor for LLMs, which can offer a cleaner layout and ensure greater consistency and coherence in content. In this work, we introduce a pipeline for aesthetics polishing and help construct a textual aesthetics dataset named TexAes. We propose a textual aesthetics-powered fine-tuning method based on direct preference optimization, termed TAPO, which leverages textual aesthetics without compromising content correctness. Additionally, we develop two evaluation methods for textual aesthetics based on text and image…

Peer Reviews

Decision·ICLR 2025 Conference Withdrawn Submission

Reviewer 01Rating 3Confidence 5

Strengths

+ The paper is well organized. + The problem of textual aesthetics in LLMs is interesting.

Weaknesses

-I have doubts about the textual aesthetics scores. The scores should be decided by human, not by ChatGPT. -The proposed textual aesthetics-powered training actually aims to predict the scores as close as ChatGPT, not human. -The authors did mention 3 evaluators, 2 graduate students and one professor. First, the number of evaluators is too small. Second, there is no information about the evaluations, for example, age, background, first language, and expertise.

Reviewer 02Rating 6Confidence 4

Strengths

Originality • The paper shows originality in addressing textual aesthetics in LLMs, an area that has received less attention compared to image aesthetics. The construction of the TEXAES dataset and the proposed TAPO fine-tuning method are novel contributions. Quality • The research methodology appears to be of good quality. The construction of the dataset through an aesthetic polishing pipeline and the use of appropriate evaluation methods (text-based and image-based) demonstrate a systematic ap

Weaknesses

Dataset Limitations • While the construction of the TEXAES dataset is a significant step, it may have limitations. The dataset is built based on a filtered version of UltraFeedback, and there could be potential biases introduced during this process. For example, the responses in UltraFeedback might already have a certain style or pattern that could limit the diversity of the aesthetic preferences captured in TEXAES. Evaluation Complexity • The evaluation methods, although comprehensive with text

Reviewer 03Rating 5Confidence 4

Strengths

1. The paper is the first to investigate textual aesthetics in LLMs, introducing the TEXAES dataset and TAPO fine-tuning method, providing a new direction for the aesthetic optimization of LLMs. 2. The paper empirically validates the effectiveness of the TEXAES dataset and TAPO method, demonstrating not only improved aesthetic scores but also enhanced model performance. 3. The paper develops two assessment methods based on text and image analysis, offering tools for a comprehensive evaluation of

Weaknesses

1. The paper does not elaborate on the quantitative standards for textual aesthetics, that is, what constitutes good textual aesthetics, and how the consistency among human evaluators in textual aesthetics assessment is ensured. 2. From the examples shown in Figure 4, the paper's concept of textual aesthetics seems to involve only line breaks, bold fonts, and highlighting key points, which are relatively simple and may lack long-term research value. 3. The paper does not clearly explain how to d

Code & Models

Repositories

JackLingjie/Textual-Aesthetics
noneOfficial

Datasets

lingjie23/TexAes
dataset· 12 dl
12 dl

Videos

Textual Aesthetics in Large Language Models· underline

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications