Fine-Tuning Large Language Models for Automatic Detection of Sexually Explicit Content in Spanish-Language Song Lyrics
Dolores Zamacola S\'anchez de Lamadrid, Eduardo C. Garrido-Merch\'an

TL;DR
This paper demonstrates that fine-tuning a GPT model on a small, curated dataset effectively detects sexually explicit Spanish-language song lyrics, outperforming generic models and supporting automated content moderation and policy development.
Contribution
It introduces a domain-specific fine-tuning approach for large language models to identify explicit lyrics in Spanish music, incorporating cultural and linguistic nuances.
Findings
Achieved 87% accuracy, 100% precision and specificity in detection.
Model agreement with human experts increased to 59.2%.
Supports automated moderation and policy proposals for music content.
Abstract
The proliferation of sexually explicit content in popular music genres such as reggaeton and trap, consumed predominantly by young audiences, has raised significant societal concern regarding the exposure of minors to potentially harmful lyrical material. This paper presents an approach to the automatic detection of sexually explicit content in Spanish-language song lyrics by fine-tuning a Generative Pre-trained Transformer (GPT) model on a curated corpus of 100 songs, evenly divided between expert-labeled explicit and non-explicit categories. The proposed methodology leverages transfer learning to adapt the pre-trained model to the idiosyncratic linguistic features of urban Latin music, including slang, metaphors, and culturally specific double entendres that evade conventional dictionary-based filtering systems. Experimental evaluation on held-out test sets demonstrates that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Artificial Intelligence in Games · Music and Audio Processing
