Impact of Non-Standard Unicode Characters on Security and Comprehension   in Large Language Models

Johan S Daniel; Anand Pal

arXiv:2405.14490·cs.CL·May 24, 2024·1 cites

Impact of Non-Standard Unicode Characters on Security and Comprehension in Large Language Models

Johan S Daniel, Anand Pal

PDF

Open Access 1 Repo

TL;DR

This paper investigates how non-standard Unicode characters affect the security and comprehension of large language models, revealing increased vulnerabilities and suggesting improvements in training data to mitigate risks.

Contribution

It provides a comparative analysis of fifteen models' vulnerabilities to Unicode-based manipulations, highlighting the impact on safety mechanisms and proposing the inclusion of non-standard Unicode in training.

Findings

01

Non-standard Unicode reduces guardrail effectiveness.

02

Models become more vulnerable to content policy breaches.

03

Inclusion of non-standard Unicode in training can improve model robustness.

Abstract

The advancement of large language models has significantly improved natural language processing. However, challenges such as jailbreaks (prompt injections that cause an LLM to follow instructions contrary to its intended use), hallucinations (generating incorrect or misleading information), and comprehension errors remain prevalent. In this report, we present a comparative analysis of the performance of fifteen distinct models, with each model undergoing a standardized test comprising 38 queries across three key metrics: jailbreaks, hallucinations, and comprehension errors. The models are assessed based on the total occurrences of jailbreaks, hallucinations, and comprehension errors. Our work exposes these models' inherent vulnerabilities and challenges the notion of human-level language comprehension of these models. We have empirically analysed the impact of non-standard Unicode…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

raidedcluster/non-standard_unicode_jailbreaks
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques

MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Multi-Head Attention · Residual Connection · Byte Pair Encoding · Label Smoothing · Adam · Absolute Position Encodings · Dropout