Learning the Topic, Not the Language: How LLMs Classify Online Immigration Discourse Across Languages
Andrea Nasuto, Stefano Maria Iacus, Francisco Rowe, Devika Jain

TL;DR
This paper presents a lightweight, open-source multilingual LLM framework that classifies online immigration discourse across 13 languages, enabling scalable, cost-effective, and inclusive social science research.
Contribution
It introduces a fine-tuned LLaMA-based approach that generalizes across languages and reduces biases with minimal data, outperforming prior models in cost and speed.
Findings
Multilingual fine-tuning improves ideological nuance detection.
The framework achieves 26-168x faster inference.
Cost savings exceed 1000x compared to commercial LLMs.
Abstract
Large language models (LLMs) offer new opportunities for scalable analysis of online discourse. Yet their use in multilingual social science research remains constrained by model size, cost and linguistic bias. We develop a lightweight, open-source LLM framework using fine-tuned LLaMA 3.2-3B models to classify immigration-related tweets across 13 languages. Unlike prior work relying on BERT style models or translation pipelines, we combine topic classification with stance detection and demonstrate that LLMs fine-tuned in just one or two languages can generalize topic understanding to unseen languages. Capturing ideological nuance, however, benefits from multilingual fine-tuning. Our approach corrects pretraining biases with minimal data from under-represented languages and avoids reliance on proprietary systems. With 26-168x faster inference and over 1000x cost savings compared to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
