Exploring Automatic Evaluation Methods based on a Decoder-based LLM for Text Generation
Tomohito Kasahara, Daisuke Kawahara

TL;DR
This paper investigates the effectiveness of decoder-based large language models for automatic text generation evaluation, finding they focus on surface features and struggle with semantic nuances compared to encoder-based models.
Contribution
It provides a comparative analysis of decoder-based versus encoder-based models for automatic evaluation across multiple tasks and languages, highlighting limitations of large decoder models.
Findings
Decoder-based models perform worse than encoder-based models in evaluation tasks.
Decoder models focus on surface word sequences, missing semantic meaning.
Large decoder models like ChatGPT have difficulty capturing fine-grained semantic differences.
Abstract
Automatic evaluation of text generation is essential for improving the accuracy of generation tasks. In light of the current trend towards increasingly larger decoder-based language models, we investigate automatic evaluation methods based on such models for text generation. This paper compares various methods, including tuning with encoder-based models and large language models under equal conditions, on two different tasks, machine translation evaluation and semantic textual similarity, in two languages, Japanese and English. Experimental results show that compared to the tuned encoder-based models, the tuned decoder-based models perform poorly. The analysis of the causes for this suggests that the decoder-based models focus on surface word sequences and do not capture meaning. It is also revealed that in-context learning of very large decoder-based models such as ChatGPT makes it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsFocus
