Minimal Re-computation for Exploratory Data Analysis in Astronomy
Bojan Nikolic, Des Small, Mark Kettenis

TL;DR
This paper introduces a memoization-based technique to minimize re-computation in iterative astronomical data analysis, enhancing efficiency, reproducibility, and error reduction, especially on cluster systems.
Contribution
It presents a novel minimal re-computation method for exploratory astronomy data analysis, including implementation details and storage optimization strategies.
Findings
Improved efficiency of data analysis workflows.
Reduced user errors and enhanced reproducibility.
Effective storage optimization with copy-on-write and de-duplication.
Abstract
We present a technique to automatically minimise the re-computation when a data analysis program is iteratively changed, or added to, as is often the case in exploratory data analysis in astronomy. A typical example is flagging and calibration of demanding or unusual observations where visual inspection suggests improvement to the processing strategy. The technique is based on memoization and referentially transparent tasks. We describe the implementation of this technique for the CASA radio astronomy data reduction package. We also propose a technique for optimising efficiency of storage of memoized intermediate data products using copy-on-write and block level de-duplication and measure their practical efficiency. We find the minimal recomputation technique improves the efficiency of data analysis while reducing the possibility for user error and improving the reproducibility of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
