Fine-Tuning Large Language Models for Automatic Detection of Sexually Explicit Content in Spanish-Language Song Lyrics

Dolores Zamacola S\'anchez de Lamadrid; Eduardo C. Garrido-Merch\'an

arXiv:2602.05485·cs.CY·February 6, 2026

Fine-Tuning Large Language Models for Automatic Detection of Sexually Explicit Content in Spanish-Language Song Lyrics

Dolores Zamacola S\'anchez de Lamadrid, Eduardo C. Garrido-Merch\'an

PDF

Open Access

TL;DR

This paper demonstrates that fine-tuning a GPT model on a small, curated dataset effectively detects sexually explicit Spanish-language song lyrics, outperforming generic models and supporting automated content moderation and policy development.

Contribution

It introduces a domain-specific fine-tuning approach for large language models to identify explicit lyrics in Spanish music, incorporating cultural and linguistic nuances.

Findings

01

Achieved 87% accuracy, 100% precision and specificity in detection.

02

Model agreement with human experts increased to 59.2%.

03

Supports automated moderation and policy proposals for music content.

Abstract

The proliferation of sexually explicit content in popular music genres such as reggaeton and trap, consumed predominantly by young audiences, has raised significant societal concern regarding the exposure of minors to potentially harmful lyrical material. This paper presents an approach to the automatic detection of sexually explicit content in Spanish-language song lyrics by fine-tuning a Generative Pre-trained Transformer (GPT) model on a curated corpus of 100 songs, evenly divided between expert-labeled explicit and non-explicit categories. The proposed methodology leverages transfer learning to adapt the pre-trained model to the idiosyncratic linguistic features of urban Latin music, including slang, metaphors, and culturally specific double entendres that evade conventional dictionary-based filtering systems. Experimental evaluation on held-out test sets demonstrates that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Artificial Intelligence in Games · Music and Audio Processing