A Communication Efficient and Scalable Distributed Data Mining for the Astronomical Data
Aruna Govada, Sanjay K. Sahay

TL;DR
This paper introduces a scalable distributed data mining method for astronomical data that reduces network transmission and downloading costs while maintaining analysis accuracy, enabling efficient processing of massive datasets.
Contribution
It proposes a distributed load balancing PCA approach to optimize computation distribution and minimize data transfer costs in astronomical data analysis.
Findings
Reduces downloading cost by ~90% with negligible accuracy loss
Outperforms existing methods in transmission cost
Effective on complex astronomical datasets
Abstract
In 2020, ~60PB of archived data will be accessible to the astronomers. But to analyze such a paramount data will be a challenging task. This is basically due to the computational model used to download the data from complex geographically distributed archives to a central site and then analyzing it in the local systems. Because the data has to be downloaded to the central site, the network BW limitation will be a hindrance for the scientific discoveries. Also analyzing this PB-scale on local machines in a centralized manner is challenging. In this virtual observatory is a step towards this problem, however, it does not provide the data mining model. Adding the distributed data mining layer to the VO can be the solution in which the knowledge can be downloaded by the astronomers instead the raw data and thereafter astronomers can either reconstruct the data back from the downloaded…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
