KpopMT: Translation Dataset with Terminology for Kpop Fandom
JiWoo Kim, Yunsu Kim, JinYeong Bak

TL;DR
KpopMT is a new translation dataset focused on capturing and translating the unique terminologies used within Kpop fandom social groups, highlighting current translation challenges for specialized social language.
Contribution
The paper introduces KpopMT, a dataset with expert-annotated translations of Kpop-specific terminology, addressing a gap in social group language translation.
Findings
Existing translation models perform poorly on Kpop-specific terminology
KpopMT reveals the difficulty of translating social group-specific language
Low scores of GPT models highlight the need for specialized translation approaches.
Abstract
While machines learn from existing corpora, humans have the unique capability to establish and accept new language systems. This makes human form unique language systems within social groups. Aligning with this, we focus on a gap remaining in addressing translation challenges within social groups, where in-group members utilize unique terminologies. We propose KpopMT dataset, which aims to fill this gap by enabling precise terminology translation, choosing Kpop fandom as an initiative for social groups given its global popularity. Expert translators provide 1k English translations for Korean posts and comments, each annotated with specific terminology within social groups' language systems. We evaluate existing translation systems including GPT models on KpopMT to identify their failure cases. Results show overall low scores, underscoring the challenges of reflecting group-specific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAsian Culture and Media Studies
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · Cosine Annealing · Layer Normalization · Linear Layer · Attention Dropout · Adam · Dropout · Weight Decay
