Evaluating LLMs Robustness in Less Resourced Languages with Proxy Models

Maciej Chrab\k{a}szcz; Katarzyna Lorenc; Karolina Seweryn

arXiv:2506.07645·cs.CL·June 10, 2025

Evaluating LLMs Robustness in Less Resourced Languages with Proxy Models

Maciej Chrab\k{a}szcz, Katarzyna Lorenc, Karolina Seweryn

PDF

Open Access

TL;DR

This paper demonstrates that low-resource language LLMs are vulnerable to simple, cost-effective character and word-level attacks, highlighting significant safety concerns and proposing a methodology for vulnerability assessment.

Contribution

It introduces a novel attack method using proxy models to identify vulnerabilities in multilingual LLMs, especially in low-resource languages like Polish.

Findings

01

Character and word-level attacks drastically alter LLM predictions

02

Vulnerabilities are present in LLMs for low-resource languages

03

Proposed methodology can be extended to other languages

Abstract

Large language models (LLMs) have demonstrated impressive capabilities across various natural language processing (NLP) tasks in recent years. However, their susceptibility to jailbreaks and perturbations necessitates additional evaluations. Many LLMs are multilingual, but safety-related training data contains mainly high-resource languages like English. This can leave them vulnerable to perturbations in low-resource languages such as Polish. We show how surprisingly strong attacks can be cheaply created by altering just a few characters and using a small proxy model for word importance calculation. We find that these character and word-level attacks drastically alter the predictions of different LLMs, suggesting a potential vulnerability that can be used to circumvent their internal safety mechanisms. We validate our attack construction methodology on Polish, a low-resource language,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Hate Speech and Cyberbullying Detection