Can We Still Hear the Accent? Investigating the Resilience of Native Language Signals in the LLM Era
Nabelanita Utami, Ryohei Sasano

TL;DR
This paper examines whether the rise of large language models has diminished the linguistic signals of authors' native languages in research papers, finding a general decline but with notable anomalies in certain languages.
Contribution
It introduces a semi-automated framework for native language identification and analyzes its trends across different eras of NLP research.
Findings
NLI performance has generally declined over time.
Chinese and French show resistance or divergent trends in NLI.
Japanese and Korean exhibit sharper declines in NLI accuracy.
Abstract
The evolution of writing assistance tools from machine translation to large language models (LLMs) has changed how researchers write. This study investigates whether this shift is homogenizing research papers by analyzing native language identification (NLI) trends in ACL Anthology papers across three eras: pre-neural network (NN), pre-LLM, and post-LLM. We construct a labeled dataset using a semi-automated framework and fine-tune a classifier to detect linguistic fingerprints of author backgrounds. Our analysis shows a consistent decline in NLI performance over time. Interestingly, the post-LLM era reveals anomalies: while Chinese and French show unexpected resistance or divergent trends, Japanese and Korean exhibit sharper-than-expected declines.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
