I can tell whether you are a Native Hawl\^eri Speaker! How ANN, CNN, and RNN perform in NLI-Native Language Identification
Hardi Garari, Hossein Hassani

TL;DR
This study evaluates neural network models for native language identification of Hewleri Kurdish speakers using speech data, achieving high accuracy and creating a novel dataset for this under-resourced dialect.
Contribution
It introduces the first speech dataset for Hewleri Kurdish NLI and compares ANN, CNN, and RNN models, highlighting RNN's superior performance.
Findings
RNN achieved 95.92% accuracy on 5-second segments
The dataset is the first of its kind for Hewleri Kurdish NLI
Neural networks can effectively identify native dialects from speech
Abstract
Native Language Identification (NLI) is a task in Natural Language Processing (NLP) that typically determines the native language of an author through their writing or a speaker through their speaking. It has various applications in different areas, such as forensic linguistics and general linguistics studies. Although considerable research has been conducted on NLI regarding two different languages, such as English and German, the literature indicates a significant gap regarding NLI for dialects and subdialects. The gap becomes wider in less-resourced languages such as Kurdish. This research focuses on NLI within the context of a subdialect of Sorani (Central) Kurdish. It aims to investigate the NLI for Hewl\^eri, a subdialect spoken in Hewl\^er (Erbil), the Capital of the Kurdistan Region of Iraq. We collected about 24 hours of speech by recording interviews with 40 native or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Speech Recognition and Synthesis · Language and cultural evolution
