Turkish Text Classification: From Lexicon Analysis to Bidirectional   Transformer

Deniz Kavi

arXiv:2104.11642·cs.CL·April 26, 2021

Turkish Text Classification: From Lexicon Analysis to Bidirectional Transformer

Deniz Kavi

PDF

Open Access

TL;DR

This paper evaluates traditional and machine learning methods for Turkish text classification and introduces a pretrained transformer model that surpasses previous approaches in accuracy.

Contribution

It presents a comprehensive comparison of lexicon analysis, SVM, and XGBoost, and proposes a pretrained transformer classifier tailored for Turkish, achieving superior performance.

Findings

01

Pretrained transformer outperforms previous methods.

02

Traditional ML models are domain-independent.

03

Transformer achieves higher accuracy in Turkish text classification.

Abstract

Text classification has seen an increased use in both academic and industry settings. Though rule based methods have been fairly successful, supervised machine learning has been shown to be most successful for most languages, where most research was done on English. In this article, the success of lexicon analysis, support vector machines, and extreme gradient boosting for the task of text classification and sentiment analysis are evaluated in Turkish and a pretrained transformer based classifier is proposed, outperforming previous methods for Turkish text classification. In the context of text classification, all machine learning models proposed in the article are domain-independent and do not require any task-specific modifications.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text and Document Classification Technologies · Sentiment Analysis and Opinion Mining