Dataset of sentiment tagged language resources for Macedonian language

Sofija Kochovska; Jernej Vičič; Branko Kavšek

PMC · DOI:10.1016/j.dib.2025.112384·December 12, 2025

Dataset of sentiment tagged language resources for Macedonian language

Sofija Kochovska, Jernej Vičič, Branko Kavšek

PDF

Open Access

TL;DR

This paper introduces a dataset of sentiment-tagged language resources for the Macedonian language, useful for sentiment analysis and other NLP tasks.

Contribution

The novelty lies in providing annotated sentiment resources for Macedonian, a less-resourced language.

Findings

01

The dataset includes sentiment-annotated words, stopwords, and polarity shifters for Macedonian.

02

The resources are primarily intended for rule-based sentiment analysis but have broader potential applications.

Abstract

Macedonian is a South Slavic language spoken by about 2 million people, primarily in North Macedonia and among diaspora communities worldwide. It’s known for a few distinctive features. Most notably, it uses definite articles attached to the end of nouns, for example, kniga (a book) becomes knigata (the book). Furthermore, it doesn’t use grammatical cases, which makes its grammar relatively straightforward compared to other Slavic languages. The dataset comprises two lists of sentiment annotated words that present the core of the Macedonian sentiment-annotated lexicon, a list of the stopwords, and a list of Affirmative and non-Affirmative words (AnAwords) composed mostly of intensifiers and diminishers, and a list of polarity shifters. The main usage of the presented materials is in rule-based sentiment analysis, but the usage of some of the lists can be much broader.

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Diseases3

MK POSITIVE.txt NEGATIVE.txt

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSentiment Analysis and Opinion Mining · Authorship Attribution and Profiling · Natural Language Processing Techniques