ChiEngMixBench: Evaluating Large Language Models on Spontaneous and Natural Chinese-English Code-Mixed Generation
Qingyan Yang, Tongxi Wang, Yunsheng Luo

TL;DR
This paper introduces ChiEngMixBench, a novel benchmark for evaluating the ability of large language models to generate authentic Chinese-English code-mixed language, emphasizing spontaneity and naturalness in real-world contexts.
Contribution
The paper presents the first scalable, community-based benchmark for code-mixing, and reveals an emergent terminology layering strategy aligned with linguistic theory.
Findings
Metrics effectively differentiate model performance in code-mixing.
Models exhibit an emergent terminology layering strategy.
Benchmark enables systematic evaluation across domains.
Abstract
Code-mixing is increasingly prevalent in interactions between humans and large language models, yet existing work often reduces it to a translation or convertibility problem, making it difficult to assess whether a model's switching behavior is context-appropriate and aligned with human conventions. We introduce ChiEngMixBench, the first benchmark designed to evaluate code-mixing ability in authentic community contexts, built upon a general construction pipeline that enables scalable dataset development across domains and bilingual pairs. ChiEngMixBench formulates code-mixing as a cognitive alignment problem, characterized by two complementary signals: Spontaneity and Naturalness. Empirical evaluation shows that our metrics can systematically distinguish code-mixing performance across models. Beyond benchmarking, we further uncover an implicitly emergent Terminology Layering Strategy, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Multilingual Education and Policy · Text Readability and Simplification
