Sensitivity and Robustness of Large Language Models to Prompt Template in Japanese Text Classification Tasks
Chengguang Gan, Tatsunori Mori

TL;DR
This paper evaluates the sensitivity and robustness of large language models, especially GPT-4, to prompt template variations in Japanese text classification, revealing significant stability issues and performance drops.
Contribution
It provides a comprehensive analysis of how prompt template modifications affect LLM performance in Japanese, highlighting stability issues in current models like GPT-4.
Findings
GPT-4 accuracy drops from 49.21% to 25.44% with prompt variation
Large language models show significant sensitivity to prompt structure in Japanese
Current models face stability challenges in multilingual prompt tasks
Abstract
Prompt engineering relevance research has seen a notable surge in recent years, primarily driven by advancements in pre-trained language models and large language models. However, a critical issue has been identified within this domain: the inadequate of sensitivity and robustness of these models towards Prompt Templates, particularly in lesser-studied languages such as Japanese. This paper explores this issue through a comprehensive evaluation of several representative Large Language Models (LLMs) and a widely-utilized pre-trained model(PLM). These models are scrutinized using a benchmark dataset in Japanese, with the aim to assess and analyze the performance of the current multilingual models in this context. Our experimental results reveal startling discrepancies. A simple modification in the sentence structure of the Prompt Template led to a drastic drop in the accuracy of GPT-4…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsMulti-Head Attention · Attention Is All You Need · Position-Wise Feed-Forward Layer · Adam · Absolute Position Encodings · Adafactor · Softmax · Layer Normalization · Inverse Square Root Schedule · Byte Pair Encoding
