Automatic Language Identification in Texts: A Survey

Tommi Jauhiainen; Marco Lui; Marcos Zampieri; Timothy Baldwin; Krister; Lind\'en

arXiv:1804.08186·cs.CL·November 22, 2018

Automatic Language Identification in Texts: A Survey

Tommi Jauhiainen, Marco Lui, Marcos Zampieri, Timothy Baldwin, Krister, Lind\'en

PDF

1 Repo

TL;DR

This survey comprehensively reviews the history, features, methods, and applications of automatic language identification, highlighting recent research trends and future challenges in the field.

Contribution

It provides an extensive, unified overview of LI techniques, evaluation methods, and open issues, serving as a valuable resource for future research directions.

Findings

01

Overview of LI features and methods

02

Discussion of evaluation techniques and applications

03

Identification of open issues and future research directions

Abstract

Language identification (LI) is the problem of determining the natural language that a document or part thereof is written in. Automatic LI has been extensively researched for over fifty years. Today, LI is a key part of many text processing pipelines, as text processing techniques generally assume that the language of the input text is known. Research in this area has recently been especially active. This article provides a brief history of LI research, and an extensive survey of the features and methods used so far in the LI literature. For describing the features and methods we introduce a unified notation. We discuss evaluation methods, applications of LI, as well as off-the-shelf LI systems that do not require training by the end user. Finally, we identify open issues, survey the work to date on each issue, and propose future directions for research in LI.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Dagobert42/langID-NLP
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.