Linguistic Analysis of Sinhala YouTube Comments on Sinhala Music Videos: A Dataset Study
W. M. Yomal De Mel, Nisansa de Silva

TL;DR
This study analyzes Sinhala YouTube comments on Sinhala music videos to create a linguistic dataset, facilitating future research in Music Information Retrieval and Emotion Recognition within Sinhala cultural context.
Contribution
It introduces a curated Sinhala comment dataset and derived stop-words, advancing computational analysis for MIR and MER in underexplored Sinhala music studies.
Findings
Dataset of 63,471 Sinhala comments created
Derived 964 Sinhala stop-words for NLP tasks
Confirmed Sinhala YouTube comments as representative of general Sinhala language
Abstract
This research investigates the area of Music Information Retrieval (MIR) and Music Emotion Recognition (MER) in relation to Sinhala songs, an underexplored field in music studies. The purpose of this study is to analyze the behavior of Sinhala comments on YouTube Sinhala song videos using social media comments as primary data sources. These included comments from 27 YouTube videos containing 20 different Sinhala songs, which were carefully selected so that strict linguistic reliability would be maintained and relevancy ensured. This process led to a total of 93,116 comments being gathered upon which the dataset was refined further by advanced filtering methods and transliteration mechanisms resulting into 63,471 Sinhala comments. Additionally, 964 stop-words specific for the Sinhala language were algorithmically derived out of which 182 matched exactly with English stop-words from NLTK…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSubtitles and Audiovisual Media · Sentiment Analysis and Opinion Mining
MethodsSparse Evolutionary Training
