The Rise of Verbal Tics in Large Language Models: A Systematic Analysis Across Frontier Models
Shuai Wu, Xue Li, Yanna Feng, Yufang Li, Zhijun Wang, Ran Wang

TL;DR
This paper systematically analyzes the prevalence of verbal tics in eight state-of-the-art large language models, introducing the Verbal Tic Index and revealing significant inter-model and cross-lingual variations.
Contribution
It presents a novel evaluation framework and the Verbal Tic Index to quantify verbal tics, highlighting their correlation with model alignment and naturalness across multiple models and languages.
Findings
Gemini 3.1 Pro has the highest Verbal Tic Index (0.590)
DeepSeek V3.2 has the lowest Verbal Tic Index (0.295)
Verbal tics increase over multi-turn conversations and are more common in subjective tasks.
Abstract
As Large Language Models (LLMs) continue to evolve through alignment techniques such as Reinforcement Learning from Human Feedback (RLHF) and Constitutional AI, a growing and increasingly conspicuous phenomenon has emerged: the proliferation of verbal tics, repetitive, formulaic linguistic patterns that pervade model outputs. These range from sycophantic openers (That's a great question!, Awesome!) to pseudo-empathetic affirmations (I completely understand your concern, I'm right here to catch you) and overused vocabulary (delve, tapestry, nuanced). In this paper, we present a systematic analysis of the verbal tic phenomenon across eight state-of-the-art LLMs: GPT-5.4, Claude Opus 4.7, Gemini 3.1 Pro, Grok 4.2, Doubao-Seed-2.0-pro, Kimi K2.5, DeepSeek V3.2, and MiMo-V2-Pro. Utilizing a custom evaluation framework for standardized API-based evaluation, we assess 10,000 prompts across 10…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
