Continuously Updated Data Analysis Systems
Lee F. Richardson

TL;DR
This paper introduces the concept of Continuously Updated Data-Analysis Systems (CUDAS), synthesizing successful data science project ideas to create adaptable, real-time analytical tools for various contexts like sports and climate modeling.
Contribution
It defines the CUDAS framework and demonstrates its application through two systems: one for soccer player ratings and another for synthetic ecosystem data generation.
Findings
Developed a real-time soccer rating system using Augmented Adjusted Plus-Minus.
Created a large synthetic ecosystem dataset for infectious disease modeling.
Showcased the versatility of CUDAS across different domains.
Abstract
When doing data science, it's important to know what you're building. This paper describes an idealized final product of a data science project, called a Continuously Updated Data-Analysis System (CUDAS). The CUDAS concept synthesizes ideas from a range of successful data science projects, such as Nate Silver's FiveThirtyEight. A CUDAS can be built for any context, such as the state of the economy, the state of the climate, and so on. To demonstrate, we build two CUDAS systems. The first provides continuously-updated ratings for soccer players, based on the newly developed Augmented Adjusted Plus-Minus statistic. The second creates a large dataset of synthetic ecosystems, which is used for agent-based modeling of infectious diseases.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Data Visualization and Analytics · Big Data and Digital Economy
