Not Every AI Problem is a Data Problem: We Should Be Intentional About Data Scaling
Tanya Rodchenko, Natasha Noy, Nino Scherrer

TL;DR
This paper emphasizes the importance of intentional data acquisition for AI, highlighting that understanding data structure and task types can optimize scaling efforts and inform future compute paradigms.
Contribution
It introduces the idea that data quality and structure, not just quantity, should guide data scaling strategies for AI development.
Findings
Data structure influences task scalability
Not all tasks benefit equally from data scaling
Guidelines for targeted data acquisition
Abstract
While Large Language Models require more and more data to train and scale, rather than looking for any data to acquire, we should consider what types of tasks are more likely to benefit from data scaling. We should be intentional in our data acquisition. We argue that the shape of the data itself, such as its compositional and structural patterns, informs which tasks to prioritize in data scaling, and shapes the development of the next generation of compute paradigms for tasks where data scaling is inefficient, or even insufficient.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI)
