Recurrent-Neural-Network for Language Detection on Twitter   Code-Switching Corpus

Joseph Chee Chang; Chu-Cheng Lin

arXiv:1412.4314·cs.NE·December 23, 2014·20 cites

Recurrent-Neural-Network for Language Detection on Twitter Code-Switching Corpus

Joseph Chee Chang, Chu-Cheng Lin

PDF

Open Access

TL;DR

This paper presents a recurrent neural network approach for language detection in code-switching Twitter data, outperforming previous SVM-based methods by leveraging raw features and word embeddings.

Contribution

It introduces a neural network model that uses only raw features and word embeddings for language detection in code-switching, eliminating the need for external linguistic tools.

Findings

01

Achieved 1% higher accuracy than previous best SVM models.

02

Reduced error rate by 17% on the Twitter code-switching corpus.

03

Demonstrated effectiveness of neural networks with raw features for complex language detection.

Abstract

Mixed language data is one of the difficult yet less explored domains of natural language processing. Most research in fields like machine translation or sentiment analysis assume monolingual input. However, people who are capable of using more than one language often communicate using multiple languages at the same time. Sociolinguists believe this "code-switching" phenomenon to be socially motivated. For example, to express solidarity or to establish authority. Most past work depend on external tools or resources, such as part-of-speech tagging, dictionary look-up, or named-entity recognizers to extract rich features for training machine learning models. In this paper, we train recurrent neural networks with only raw features, and use word embedding to automatically learn meaningful representations. Using the same mixed-language Twitter corpus, our system is able to outperform the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Digital Communication and Language · Authorship Attribution and Profiling