Graceful Forgetting II. Data as a Process
Alain de Cheveign\'e

TL;DR
This paper discusses the ongoing process of data curation, emphasizing the importance of summary statistics and rescaling to maximize data value amid exponential growth, with implications for learning and storage.
Contribution
It introduces a novel perspective on data curation as an ongoing rescaling process involving summary statistics to optimize future data utility.
Findings
Data growth is exponential, complicating processing and storage.
Curation involves creating summary statistics that are rescaled over time.
Rescaling can aid learning but must be carefully managed to maintain relevance.
Abstract
Data are rapidly growing in size and importance for society, a trend motivated by their enabling power. The accumulation of new data, sustained by progress in technology, leads to a boundless expansion of stored data, in some cases with an exponential increase in the accrual rate itself. Massive data are hard to process, transmit, store, and exploit, and it is particularly hard to keep abreast of the data store as a whole. This paper distinguishes three phases in the life of data: acquisition, curation, and exploitation. Each involves a distinct process, that may be separated from the others in time, with a different set of priorities. The function of the second phase, curation, is to maximize the future value of the data given limited storage. I argue that this requires that (a) the data take the form of summary statistics and (b) these statistics follow an endless process of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Advanced Data Storage Technologies · Time Series Analysis and Forecasting
