Meta Semantic Template for Evaluation of Large Language Models
Yachuan Liu, Liang Chen, Jindong Wang, Qiaozhu Mei, Xing Xie

TL;DR
This paper introduces MSTemp, a novel method for evaluating large language models' semantic understanding by generating out-of-distribution evaluation sets through semantic templates, revealing models' limitations.
Contribution
MSTemp provides a flexible, dynamic, and cost-effective approach to create OOD evaluation datasets for assessing LLMs' semantic comprehension.
Findings
MSTemp-generated samples significantly reduce LLM performance.
The approach effectively tests LLMs beyond existing benchmark datasets.
Initial experiments demonstrate the method's potential for LLM evaluation.
Abstract
Do large language models (LLMs) genuinely understand the semantics of the language, or just memorize the training data? The recent concern on potential data contamination of LLMs has raised awareness of the community to conduct research on LLMs evaluation. In this paper, we propose MSTemp, an approach that creates meta semantic templates to evaluate the semantic understanding ability of LLMs. The core of MSTemp is not to perform evaluation directly on existing benchmark datasets, but to generate new out-of-distribution (OOD) evaluation sets using existing datasets as seeds. Specifically, for a given sentence, MSTemp leverages another language model to generate new samples while preserving its semantics. The new samples are called semantic templates to the original sentence. Then, MSTemp generates evaluation samples via sentence parsing and random word replacement on the semantic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
