Towards an automatic recognition of mixed languages: The Ukrainian-Russian hybrid language Surzhyk
Nataliya Sira, Giorgio Maria Di Nunzio, Viviana Nosilia

TL;DR
This paper presents an initial approach to automatically recognize the Ukrainian-Russian hybrid language Surzhyk by analyzing spoken samples, creating identification rules, and testing their effectiveness using R programming tools.
Contribution
It introduces a novel method for identifying Surzhyk elements through example-based rules and computational implementation, addressing a gap in hybrid language recognition.
Findings
Rules effectively identify Surzhyk patterns
Analysis of spoken samples informs rule creation
Implementation demonstrates promising results
Abstract
Language interference is common in today's multilingual societies where more languages are being in contact and as a global final result leads to the creation of hybrid languages. These, together with doubts on their right to be officially recognised made emerge in the area of computational linguistics the problem of their automatic identification and further elaboration. In this paper, we propose a first attempt to identify the elements of a Ukrainian-Russian hybrid language, Surzhyk, through the adoption of the example-based rules created with the instruments of programming language R. Our example-based study consists of: 1) analysis of spoken samples of Surzhyk registered by Del Gaudio (2010) in Kyiv area and creation of the written corpus; 2) production of specific rules on the identification of Surzhyk patterns and their implementation; 3) testing the code and analysing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLinguistics, Language Diversity, and Identity · Linguistic research and analysis · Authorship Attribution and Profiling
