Paraphrase Types Elicit Prompt Engineering Capabilities
Jan Philip Wahle, Terry Ruas, Yang Xu, Bela Gipp

TL;DR
This paper systematically investigates how different paraphrase types affect language model performance, revealing that specific linguistic variations can significantly enhance task outcomes and inform better prompt engineering strategies.
Contribution
It provides the first comprehensive empirical analysis of how various paraphrase types influence language model behavior, highlighting key linguistic features that improve prompt effectiveness.
Findings
Morphology and lexicon changes improve prompts significantly.
Median gains of 6.7% in Mixtral 8x7B and 5.5% in LLaMA 3 8B.
Prompt variability impacts model robustness and task performance.
Abstract
Much of the success of modern language models depends on finding a suitable prompt to instruct the model. Until now, it has been largely unknown how variations in the linguistic expression of prompts affect these models. This study systematically and empirically evaluates which linguistic features influence models through paraphrase types, i.e., different linguistic changes at particular positions. We measure behavioral changes for five models across 120 tasks and six families of paraphrases (i.e., morphology, syntax, lexicon, lexico-syntax, discourse, and others). We also control for other prompt engineering factors (e.g., prompt length, lexical diversity, and proximity to training data). Our results show a potential for language models to improve tasks when their prompts are adapted in specific paraphrase types (e.g., 6.7% median gain in Mixtral 8x7B; 5.5% in LLaMA 3 8B). In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques
MethodsLLaMA
