Sentiment analysis model for Twitter data in Polish language

Karol Chlasta

arXiv:1911.00985·cs.CL·November 5, 2019·1 cites

Sentiment analysis model for Twitter data in Polish language

Karol Chlasta

PDF

Open Access

TL;DR

This paper presents a sentiment analysis model for Polish tweets related to the 2015 presidential election, utilizing emoticons and words to classify sentiment with machine learning classifiers.

Contribution

It introduces a sentiment scoring method for Polish tweets and evaluates machine learning classifiers for automatic tweet classification.

Findings

01

Naive Bayes achieved 71.76% accuracy

02

Maximum Entropy achieved 77.32% accuracy

03

Implemented using R programming language

Abstract

Text mining analysis of tweets gathered during Polish presidential election on May 10th, 2015. The project included implementation of engine to retrieve information from Twitter, building document corpora, corpora cleaning, and creating Term-Document Matrix. Each tweet from the text corpora was assigned a category based on its sentiment score. The score was calculated using the number of positive and/or negative emoticons and Polish words in each document. The result data set was used to train and test four machine learning classifiers, to select these providing most accurate automatic tweet classification results. The Naive Bayes and Maximum Entropy algorithms achieved the best accuracy of respectively 71.76% and 77.32%. All implementation tasks were completed using R programming language.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling · Natural Language Processing Techniques · Sentiment Analysis and Opinion Mining

MethodsTest