Conditioning LLMs to Generate Code-Switched Text

Maite Heredia; Gorka Labaka; Jeremy Barnes; Aitor Soroa

arXiv:2502.12924·cs.CL·March 9, 2026

Conditioning LLMs to Generate Code-Switched Text

Maite Heredia, Gorka Labaka, Jeremy Barnes, Aitor Soroa

PDF

Open Access 1 Repo

TL;DR

This paper explores how fine-tuning large language models with back-translated code-switched data improves their ability to generate fluent English-Spanish code-switched text, highlighting the importance of human-aligned evaluation methods.

Contribution

It introduces a novel fine-tuning approach using back-translated CS data and provides a comprehensive analysis of model performance and evaluation metrics.

Findings

01

Fine-tuning enhances fluency in CS text generation.

02

Traditional metrics do not align with human judgments.

03

LLM-based judgment correlates better with human preferences.

Abstract

Code-switching (CS) is still a critical challenge in Natural Language Processing (NLP), due to the limited availability of large-scale, diverse CS datasets for robust training and evaluation. Despite recent advances, the capabilities and limitations of LLMs in handling CS are still not fully understood. In this work, we investigate the extent to which LLMs can be used in a framework for CS text generation, focusing on the English-Spanish language pair. Our proposed methodology consists of back-translating natural CS sentences into monolingual English, and using the resulting parallel corpus to fine-tune LLMs to turn monolingual sentences into CS. We thoroughly analyse the models' performance through a study on human preferences, a qualitative error analysis, an evaluation with popular reference-based metrics and LLM-based judgment. Results show that fine-tuning can be a key step to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hitz-zentroa/cs-generation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques