# KanuriSenti: A novel dataset for sentiment analysis in the under-resourced Kanuri language

**Authors:** Bashir Maina Saleh, Saurabh Bilgaiyan, Santwana Sagnika

PMC · DOI: 10.1016/j.dib.2025.111758 · Data in Brief · 2025-06-07

## TL;DR

This paper introduces KanuriSenti, a new sentiment analysis dataset for the under-resourced Kanuri language to improve NLP resources in Africa.

## Contribution

KanuriSenti is the first structured and sentiment-annotated dataset for the Kanuri language.

## Key findings

- KanuriSenti contains over 10,000 entries labeled with sentiment polarity and annotated for Valence, Arousal, and Dominance.
- Annotation reliability was validated using Cohen’s Kappa, and lexical richness was confirmed via Type-Token Ratio.
- The dataset provides a benchmark for sentiment analysis in low-resource language settings.

## Abstract

This paper presents KanuriSenti, a novel sentiment analysis dataset developed for Kanuri, a major yet under-resourced language spoken across Nigeria and the Lake Chad region. The dataset addresses a critical gap in Natural Language Processing by providing structured and sentiment-annotated Kanuri data, which has been largely absent from existing resources. KanuriSenti consists of a lexicon-based dataset containing over 10,000 entries labeled with sentiment polarity (positive, negative, neutral), and an affective E-ANEW-style dataset annotated across Valence, Arousal, and Dominance dimensions by native Kanuri speakers. Annotation consistency and reliability was validated using Cohen’s Kappa, and lexical richness was measured using the Type-Token Ratio, confirming our dataset's suitability for sentiment-related tasks. While the dataset is designed specifically for sentiment analysis, its cultural and linguistic authenticity makes it a valuable benchmark for evaluating sentiment models in low-resource language settings and advancing equitable language technology in Africa.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12226035/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12226035/full.md

## References

14 references — full list in the complete paper: https://tomesphere.com/paper/PMC12226035/full.md

---
Source: https://tomesphere.com/paper/PMC12226035