Double Jeopardy and Climate Impact in the Use of Large Language Models: Socio-economic Disparities and Reduced Utility for Non-English Speakers
Aivin V. Solatorio, Gabriel Stefanini Vicente, Holly Krambeck, Olivier, Dupriez

TL;DR
This paper highlights how large language models disproportionately benefit English speakers, especially in low-resource languages, due to tokenization costs and performance disparities, exacerbating socio-economic inequalities and environmental impacts.
Contribution
It reveals the socio-economic and linguistic disparities in LLM access and performance, emphasizing the need for fairer AI development for low-resource languages.
Findings
English speakers face lower costs with LLMs due to tokenization.
Low-resource language users incur 4-6 times higher costs.
LLMs perform poorly in low-resource languages, worsening inequalities.
Abstract
Artificial Intelligence (AI), particularly large language models (LLMs), holds the potential to bridge language and information gaps, which can benefit the economies of developing nations. However, our analysis of FLORES-200, FLORES+, Ethnologue, and World Development Indicators data reveals that these benefits largely favor English speakers. Speakers of languages in low-income and lower-middle-income countries face higher costs when using OpenAI's GPT models via APIs because of how the system processes the input -- tokenization. Around 1.5 billion people, speaking languages primarily from lower-middle-income countries, could incur costs that are 4 to 6 times higher than those faced by English speakers. Disparities in LLM performance are significant, and tokenization in models priced per token amplifies inequalities in access, cost, and utility. Moreover, using the quality of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Linear Layer · Multi-Head Attention · Dense Connections · Residual Connection · Dropout · Layer Normalization · Linear Warmup With Cosine Annealing
