TL;DR
CarbonScaling introduces a comprehensive, hardware-aware analytical framework to accurately model and estimate the carbon footprint of large language model training, considering system-level factors and operational constraints.
Contribution
It extends neural scaling laws by integrating hardware heterogeneity, parallelism, and carbon accounting into a unified model for more precise emissions estimation.
Findings
Higher fidelity in carbon footprint estimation compared to regression baselines.
Embodied carbon becomes significant at trillion-parameter scales.
Framework supports modeling diverse parallelism strategies and hardware configurations.
Abstract
Large language models (LLMs) increasingly follow neural scaling laws that tie performance gains to rapidly expanding computational budgets, raising concerns about the sustainability of frontier-scale training. Existing carbon-estimation methods largely depend on regression over historical runs and fail to capture critical system-level factors, including hardware heterogeneity, distributed parallelism, communication overhead, and architectural sparsity. We present \textit{CarbonScaling}, a hardware-aware analytical framework for modeling the carbon scaling behavior of frontier LLM training. The framework integrates neural scaling laws, distributed training strategies, accelerator and interconnect modeling, and operational and embodied carbon accounting to estimate feasible hardware configurations and associated emissions. CarbonScaling jointly models tensor, pipeline, data, and expert…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Big Data and Digital Economy · Topic Modeling
