Can LLMs Simulate L2-English Dialogue? An Information-Theoretic Analysis of L1-Dependent Biases
Rena Gao, Xuetong Wu, Tatsuki Kuribayashi, Mingrui Ye, Siya Qi,, Carsten Roever, Yuanxing Liu, Zheng Yuan, Jey Han Lau

TL;DR
This paper investigates whether large language models can simulate non-native English speech influenced by native language biases, using information-theoretic analysis to compare model outputs with human L2 learner data.
Contribution
It introduces an information-theoretic framework to analyze L1-dependent biases in LLM-generated L2 English dialogue, highlighting their ability to mimic human L2 language patterns.
Findings
LLMs replicate L1-dependent linguistic biases observed in human L2 learners.
Different native languages influence specific grammatical and lexical patterns in LLM outputs.
Modern LLMs show potential for L2 dialogue generation and educational assessment.
Abstract
This study evaluates Large Language Models' (LLMs) ability to simulate non-native-like English use observed in human second language (L2) learners interfered with by their native first language (L1). In dialogue-based interviews, we prompt LLMs to mimic L2 English learners with specific L1s (e.g., Japanese, Thai, Urdu) across seven languages, comparing their outputs to real L2 learner data. Our analysis examines L1-driven linguistic biases, such as reference word usage and avoidance behaviors, using information-theoretic and distributional density measures. Results show that modern LLMs (e.g., Qwen2.5, LLAMA3.3, DeepseekV3, GPT-4o) replicate L1-dependent patterns observed in human L2 data, with distinct influences from various languages (e.g., Japanese, Korean, and Mandarin significantly affect tense agreement, and Urdu influences noun-verb collocations). Our results reveal the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems
