Digital Skin, Digital Bias: Uncovering Tone-Based Biases in LLMs and Emoji Embeddings

Mingchen Li; Wajdi Aljedaani; Yingjie Liu; Navyasri Meka; Xuan Lu; Xinyue Ye; Junhua Ding; Yunhe Feng

arXiv:2604.06863·cs.SI·April 9, 2026

Digital Skin, Digital Bias: Uncovering Tone-Based Biases in LLMs and Emoji Embeddings

Mingchen Li, Wajdi Aljedaani, Yingjie Liu, Navyasri Meka, Xuan Lu, Xinyue Ye, Junhua Ding, Yunhe Feng

PDF

TL;DR

This study compares bias in skin-tone emoji representations across emoji-specific models and large language models, revealing systemic disparities and biases that could impact social inclusion online.

Contribution

It provides the first large-scale comparison of skin-tone emoji bias in dedicated emoji models versus modern LLMs, highlighting critical performance gaps and societal biases.

Findings

01

LLMs support skin tone modifiers more robustly than emoji-specific models

02

Specialized emoji models show significant deficiencies in representing skin tones

03

Evidence of skewed sentiment and inconsistent meanings across skin tones in emoji representations

Abstract

Skin-toned emojis are crucial for fostering personal identity and social inclusion in online communication. As AI models, particularly Large Language Models (LLMs), increasingly mediate interactions on web platforms, the risk that these systems perpetuate societal biases through their representation of such symbols is a significant concern. This paper presents the first large-scale comparative study of bias in skin-toned emoji representations across two distinct model classes. We systematically evaluate dedicated emoji embedding models (emoji2vec, emoji-sw2v) against four modern LLMs (Llama, Gemma, Qwen, and Mistral). Our analysis first reveals a critical performance gap: while LLMs demonstrate robust support for skin tone modifiers, widely-used specialized emoji models exhibit severe deficiencies. More importantly, a multi-faceted investigation into semantic consistency,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.