Double Jeopardy and Climate Impact in the Use of Large Language Models:   Socio-economic Disparities and Reduced Utility for Non-English Speakers

Aivin V. Solatorio; Gabriel Stefanini Vicente; Holly Krambeck; Olivier; Dupriez

arXiv:2410.10665·cs.CL·October 15, 2024·2 cites

Double Jeopardy and Climate Impact in the Use of Large Language Models: Socio-economic Disparities and Reduced Utility for Non-English Speakers

Aivin V. Solatorio, Gabriel Stefanini Vicente, Holly Krambeck, Olivier, Dupriez

PDF

Open Access 1 Repo

TL;DR

This paper highlights how large language models disproportionately benefit English speakers, especially in low-resource languages, due to tokenization costs and performance disparities, exacerbating socio-economic inequalities and environmental impacts.

Contribution

It reveals the socio-economic and linguistic disparities in LLM access and performance, emphasizing the need for fairer AI development for low-resource languages.

Findings

01

English speakers face lower costs with LLMs due to tokenization.

02

Low-resource language users incur 4-6 times higher costs.

03

LLMs perform poorly in low-resource languages, worsening inequalities.

Abstract

Artificial Intelligence (AI), particularly large language models (LLMs), holds the potential to bridge language and information gaps, which can benefit the economies of developing nations. However, our analysis of FLORES-200, FLORES+, Ethnologue, and World Development Indicators data reveals that these benefits largely favor English speakers. Speakers of languages in low-income and lower-middle-income countries face higher costs when using OpenAI's GPT models via APIs because of how the system processes the input -- tokenization. Around 1.5 billion people, speaking languages primarily from lower-middle-income countries, could incur costs that are 4 to 6 times higher than those faced by English speakers. Disparities in LLM performance are significant, and tokenization in models priced per token amplifies inequalities in access, cost, and utility. Moreover, using the quality of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

worldbank/double-jeopardy-in-llms
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Linear Layer · Multi-Head Attention · Dense Connections · Residual Connection · Dropout · Layer Normalization · Linear Warmup With Cosine Annealing