Exploring Automatic Evaluation Methods based on a Decoder-based LLM for   Text Generation

Tomohito Kasahara; Daisuke Kawahara

arXiv:2310.11026·cs.CL·October 18, 2023·1 cites

Exploring Automatic Evaluation Methods based on a Decoder-based LLM for Text Generation

Tomohito Kasahara, Daisuke Kawahara

PDF

Open Access

TL;DR

This paper investigates the effectiveness of decoder-based large language models for automatic text generation evaluation, finding they focus on surface features and struggle with semantic nuances compared to encoder-based models.

Contribution

It provides a comparative analysis of decoder-based versus encoder-based models for automatic evaluation across multiple tasks and languages, highlighting limitations of large decoder models.

Findings

01

Decoder-based models perform worse than encoder-based models in evaluation tasks.

02

Decoder models focus on surface word sequences, missing semantic meaning.

03

Large decoder models like ChatGPT have difficulty capturing fine-grained semantic differences.

Abstract

Automatic evaluation of text generation is essential for improving the accuracy of generation tasks. In light of the current trend towards increasingly larger decoder-based language models, we investigate automatic evaluation methods based on such models for text generation. This paper compares various methods, including tuning with encoder-based models and large language models under equal conditions, on two different tasks, machine translation evaluation and semantic textual similarity, in two languages, Japanese and English. Experimental results show that compared to the tuned encoder-based models, the tuned decoder-based models perform poorly. The analysis of the causes for this suggests that the decoder-based models focus on surface word sequences and do not capture meaning. It is also revealed that in-context learning of very large decoder-based models such as ChatGPT makes it…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsFocus