The Feasibility of Training Sovereign Language Models in the Global South: A Study of Brazil and Mexico
Sandra Malagon (1, 2), Monica A. Ulloa Ruiz (1, 2), Tatiana Elizabeth Sandoval Plaza (1), Gabriel Rafael Rosario Bol\'ivar (1), Valentina Garc\'ia Mesa (1), and Ivanna Alvarado Morales (1) ((1) Carreras con Impacto, (2) AIxo)

TL;DR
This study assesses the technical and financial feasibility of training large language models in Brazil and Mexico, emphasizing hardware efficiency, energy use, and policy strategies to enable local AI development within resource constraints.
Contribution
It provides a detailed analysis of the compute, energy, and cost requirements for sovereign-scale language model training in the Global South, proposing policy levers like extended training timelines.
Findings
H100 hardware enables feasible training at 8-14 million USD
A100 hardware requires 19-32 million USD due to higher resource demands
Extending training timelines can mitigate hardware constraints and support local AI sovereignty
Abstract
The rapid escalation of computational requirements for training large-scale language models has reinforced structural asymmetries between high-capacity jurisdictions and countries in the Global South. This paper examines the technical and fiscal feasibility of sovereign-scale language model training in Brazil and Mexico under conditions of constrained hardware access, energy availability, and fiscal ceilings. Using a dual-axis design that varies accelerator generation (NVIDIA H100 vs. A100) and training duration (90 vs. 150 days), we estimate compute demand, energy consumption, capital expenditures, and regulatory compatibility for the training of a 10-trillion-token model. Our findings show that while all configurations remain below export-control and electrical infrastructure thresholds, fiscal viability is determined by hardware efficiency. H100-based scenarios achieve training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBig Data and Digital Economy · ICT in Developing Communities · Green IT and Sustainability
