No Universal Courtesy: A Cross-Linguistic, Multi-Model Study of Politeness Effects on LLMs Using the PLUM Corpus
Hitesh Mehta, Arjit Saxena, Garima Chhikara, Rohit Kumar

TL;DR
This study investigates how politeness and impoliteness in prompts affect large language models across multiple languages and models, revealing language- and model-dependent variations in response quality.
Contribution
It introduces the PLUM corpus and provides a comprehensive cross-linguistic, multi-model analysis of politeness effects on LLMs, highlighting their non-universal impact.
Findings
Politeness improves response quality by up to 11%.
Model sensitivity to tone varies, with Llama most sensitive and GPT more robust.
Politeness effects differ significantly across languages and models.
Abstract
This paper explores the response of Large Language Models (LLMs) to user prompts with different degrees of politeness and impoliteness. The Politeness Theory by Brown and Levinson and the Impoliteness Framework by Culpeper form the basis of experiments conducted across three languages (English, Hindi, Spanish), five models (Gemini-Pro, GPT-4o Mini, Claude 3.7 Sonnet, DeepSeek-Chat, and Llama 3), and three interaction histories between users (raw, polite, and impolite). Our sample consists of 22,500 pairs of prompts and responses of various types, evaluated across five levels of politeness using an eight-factor assessment framework: coherence, clarity, depth, responsiveness, context retention, toxicity, conciseness, and readability. The findings show that model performance is highly influenced by tone, dialogue history, and language. While polite prompts enhance the average response…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
