Textual Aesthetics in Large Language Models
Lingjie Jiang, Shaohan Huang, Xun Wu, Furu Wei

TL;DR
This paper introduces a new approach for improving textual aesthetics in large language models through a dedicated dataset, a fine-tuning method called TAPO, and evaluation techniques, leading to better aesthetic and overall performance.
Contribution
It presents a novel pipeline, TexAes dataset, and TAPO fine-tuning method to enhance textual aesthetics in LLMs without sacrificing content correctness.
Findings
Textual aesthetics can be quantitatively improved in LLMs.
TAPO fine-tuning enhances aesthetic scores and general performance.
Evaluation methods effectively measure textual aesthetics.
Abstract
Image aesthetics is a crucial metric in the field of image generation. However, textual aesthetics has not been sufficiently explored. With the widespread application of large language models (LLMs), previous work has primarily focused on the correctness of content and the helpfulness of responses. Nonetheless, providing responses with textual aesthetics is also an important factor for LLMs, which can offer a cleaner layout and ensure greater consistency and coherence in content. In this work, we introduce a pipeline for aesthetics polishing and help construct a textual aesthetics dataset named TexAes. We propose a textual aesthetics-powered fine-tuning method based on direct preference optimization, termed TAPO, which leverages textual aesthetics without compromising content correctness. Additionally, we develop two evaluation methods for textual aesthetics based on text and image…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
+ The paper is well organized. + The problem of textual aesthetics in LLMs is interesting.
-I have doubts about the textual aesthetics scores. The scores should be decided by human, not by ChatGPT. -The proposed textual aesthetics-powered training actually aims to predict the scores as close as ChatGPT, not human. -The authors did mention 3 evaluators, 2 graduate students and one professor. First, the number of evaluators is too small. Second, there is no information about the evaluations, for example, age, background, first language, and expertise.
Originality • The paper shows originality in addressing textual aesthetics in LLMs, an area that has received less attention compared to image aesthetics. The construction of the TEXAES dataset and the proposed TAPO fine-tuning method are novel contributions. Quality • The research methodology appears to be of good quality. The construction of the dataset through an aesthetic polishing pipeline and the use of appropriate evaluation methods (text-based and image-based) demonstrate a systematic ap
Dataset Limitations • While the construction of the TEXAES dataset is a significant step, it may have limitations. The dataset is built based on a filtered version of UltraFeedback, and there could be potential biases introduced during this process. For example, the responses in UltraFeedback might already have a certain style or pattern that could limit the diversity of the aesthetic preferences captured in TEXAES. Evaluation Complexity • The evaluation methods, although comprehensive with text
1. The paper is the first to investigate textual aesthetics in LLMs, introducing the TEXAES dataset and TAPO fine-tuning method, providing a new direction for the aesthetic optimization of LLMs. 2. The paper empirically validates the effectiveness of the TEXAES dataset and TAPO method, demonstrating not only improved aesthetic scores but also enhanced model performance. 3. The paper develops two assessment methods based on text and image analysis, offering tools for a comprehensive evaluation of
1. The paper does not elaborate on the quantitative standards for textual aesthetics, that is, what constitutes good textual aesthetics, and how the consistency among human evaluators in textual aesthetics assessment is ensured. 2. From the examples shown in Figure 4, the paper's concept of textual aesthetics seems to involve only line breaks, bold fonts, and highlighting key points, which are relatively simple and may lack long-term research value. 3. The paper does not clearly explain how to d
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications
