Stale View Cleaning: Getting Fresh Answers from Stale Materialized Views
Sanjay Krishnan, Jiannan Wang, Michael J. Franklin, Ken Goldberg, Tim, Kraska

TL;DR
Stale View Cleaning (SVC) offers an efficient data cleaning approach to provide accurate, approximate query answers from stale materialized views between maintenance cycles, especially effective with skewed data distributions.
Contribution
SVC introduces a sampling-based data cleaning method for stale materialized views, improving query accuracy and efficiency over traditional full maintenance methods.
Findings
Cleaning a sample is more efficient than full view maintenance.
Estimated results are more accurate than stale views.
SVC is effective across various materialized view types.
Abstract
Materialized views (MVs), stored pre-computed results, are widely used to facilitate fast queries on large datasets. When new records arrive at a high rate, it is infeasible to continuously update (maintain) MVs and a common solution is to defer maintenance by batching updates together. Between batches the MVs become increasingly stale with incorrect, missing, and superfluous rows leading to increasingly inaccurate query results. We propose Stale View Cleaning (SVC) which addresses this problem from a data cleaning perspective. In SVC, we efficiently clean a sample of rows from a stale MV, and use the clean sample to estimate aggregate query results. While approximate, the estimated query results reflect the most recent data. As sampling can be sensitive to long-tailed distributions, we further explore an outlier indexing technique to give increased accuracy when the data distributions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Database Systems and Queries · Advanced Data Storage Technologies
