The Mask of Civility: Benchmarking Chinese Mock Politeness Comprehension in Large Language Models
Yitong Zhang, Yuhan Xiang, Mingxuan Liu

TL;DR
This paper systematically evaluates large language models' ability to understand Chinese politeness, impoliteness, and mock politeness using pragmatic theories and diverse prompting strategies.
Contribution
It introduces a novel dataset and benchmarking framework for Chinese pragmatic comprehension in LLMs, integrating linguistic theory with model evaluation.
Findings
GPT-5.1 outperforms other models in recognizing politeness phenomena.
Knowledge-enhanced prompting improves model performance.
The study bridges linguistic pragmatics and AI evaluation methods.
Abstract
From a pragmatic perspective, this study systematically evaluates the differences in performance among representative large language models (LLMs) in recognizing politeness, impoliteness, and mock politeness phenomena in Chinese. Addressing the existing gaps in pragmatic comprehension, the research adopts the frameworks of Rapport Management Theory and the Model of Mock Politeness to construct a three-category dataset combining authentic and simulated Chinese discourse. Six representative models, including GPT-5.1 and DeepSeek, were selected as test subjects and evaluated under four prompting conditions: zero-shot, few-shot, knowledge-enhanced, and hybrid strategies. This study serves as a meaningful attempt within the paradigm of ``Great Linguistics,'' offering a novel approach to applying pragmatic theory in the age of technological transformation. It also responds to the contemporary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
