# A dataset of subjectivity classification in Indonesian ride-hailing app reviews

**Authors:** Violeta Arifin, Yuriashi Adelia Putri, Richard Wiputra

PMC · DOI: 10.1016/j.dib.2025.112348 · 2025-12-01

## TL;DR

This paper introduces a dataset of 1338 Indonesian ride-hailing app reviews labeled for subjectivity, aiding in the analysis of user feedback in low-resource languages.

## Contribution

The novel contribution is a high-quality, annotated Indonesian subjectivity dataset for ride-hailing app reviews with consensus-based labeling.

## Key findings

- The dataset includes 1338 Indonesian ride-hailing app reviews annotated for subjectivity.
- The dataset is suitable for benchmarking classifiers and analyzing cross-domain generalization.
- The structured annotation design supports reproducible analysis across modeling approaches.

## Abstract

As more people share their experiences online, understanding whether their reviews are subjective or objective has become key to evaluating how services are perceived, especially in the Indonesian ride-hailing industry. This article presents a subjectivity dataset of 1338 Indonesian-language ride-hailing app reviews collected from the Google Play Store. To enhance the quality and consistency of the data for analysis, all reviews were preprocessed to eliminate elements such as URLs, emojis, and extraneous characters. Two independent annotators manually annotate the dataset, followed by a consensus-based adjudication process to produce a high-quality classification. The annotation supports robust evaluation of subjectivity detection models and contributes toward developing more nuanced natural language understanding systems in low-resource languages. The data can be reused for multiple research purposes, including benchmarking supervised classifiers, evaluating multilingual and large language models, analyzing cross-domain generalization, and extending subjectivity detection research to other Southeast Asian languages. The dataset also serves as a high-quality reference resource due to its structured annotation design and consensus-based labeling procedure, which enable reproducible analysis across different modeling approaches. By offering a transparent and fully documented dataset, this work provides a valuable foundation for developing intelligent systems capable of interpreting user feedback in real-world digital service environments, particularly in the Indonesian ride-hailing sector.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12765260/full.md

---
Source: https://tomesphere.com/paper/PMC12765260