Semantic Communities and Boundary-Spanning Lyrics in K-pop: A Graph-Based Unsupervised Analysis
Oktay Karaku\c{s}

TL;DR
This paper introduces a graph-based unsupervised method to discover semantic communities in K-pop lyrics, revealing boundary-spanning songs with higher lexical diversity and lower repetition, applicable across languages without supervision.
Contribution
It presents a novel graph-based framework for unsupervised semantic community detection and boundary analysis in multilingual lyric corpora, overcoming limitations of supervised methods.
Findings
Boundary-spanning lyrics have higher lexical diversity.
Boundary lyrics exhibit lower repetition than core community members.
The framework is language-agnostic and effective without supervision.
Abstract
Large-scale lyric corpora present unique challenges for data-driven analysis, including the absence of reliable annotations, multilingual content, and high levels of stylistic repetition. Most existing approaches rely on supervised classification, genre labels, or coarse document-level representations, limiting their ability to uncover latent semantic structure. We present a graph-based framework for unsupervised discovery and evaluation of semantic communities in K-pop lyrics using line-level semantic representations. By constructing a similarity graph over lyric texts and applying community detection, we uncover stable micro-theme communities without genre, artist, or language supervision. We further identify boundary-spanning songs via graph-theoretic bridge metrics and analyse their structural properties. Across multiple robustness settings, boundary-spanning lyrics exhibit higher…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Asian Culture and Media Studies · Computational and Text Analysis Methods
