Stop Preaching and Start Practising Data Frugality for Responsible Development of AI
Sophia N. Wilson, Gu{\dh}r\'un Fj\'ola Gu{\dh}mundsd\'ottir, Andrew Millard, Raghavendra Selvan, Sebastian Mair

TL;DR
This paper advocates for adopting data frugality in AI development, emphasizing environmental sustainability and efficiency, supported by empirical evidence showing that subset selection can reduce energy use with minimal accuracy loss.
Contribution
It highlights the environmental costs of data scaling and demonstrates practical benefits of data frugality through empirical experiments on subset selection methods.
Findings
Coreset-based subset selection reduces training energy consumption.
Data frugality can mitigate dataset bias.
Large-scale data growth yields diminishing returns in performance.
Abstract
This position paper argues that the machine learning community must move from preaching to practising data frugality for responsible artificial intelligence (AI) development. For long, progress has been equated with ever-larger datasets, driving remarkable advances but now yielding increasingly diminishing performance gains alongside rising energy use and carbon emissions. While awareness of data frugal approaches has grown, their adoption has remained rhetorical, and data scaling continues to dominate development practice. We argue that this gap between preach and practice must be closed, as continued data scaling entails substantial and under-accounted environmental impacts. To ground our position, we provide indicative estimates of the energy use and carbon emissions associated with the downstream use of ImageNet-1K. We then present empirical evidence that data frugality is both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Green IT and Sustainability · ICT in Developing Communities
