A Communication Efficient and Scalable Distributed Data Mining for the   Astronomical Data

Aruna Govada; Sanjay K. Sahay

arXiv:1606.07345·astro-ph.IM·November 16, 2018

A Communication Efficient and Scalable Distributed Data Mining for the Astronomical Data

Aruna Govada, Sanjay K. Sahay

PDF

TL;DR

This paper introduces a scalable distributed data mining method for astronomical data that reduces network transmission and downloading costs while maintaining analysis accuracy, enabling efficient processing of massive datasets.

Contribution

It proposes a distributed load balancing PCA approach to optimize computation distribution and minimize data transfer costs in astronomical data analysis.

Findings

01

Reduces downloading cost by ~90% with negligible accuracy loss

02

Outperforms existing methods in transmission cost

03

Effective on complex astronomical datasets

Abstract

In 2020, ~60PB of archived data will be accessible to the astronomers. But to analyze such a paramount data will be a challenging task. This is basically due to the computational model used to download the data from complex geographically distributed archives to a central site and then analyzing it in the local systems. Because the data has to be downloaded to the central site, the network BW limitation will be a hindrance for the scientific discoveries. Also analyzing this PB-scale on local machines in a centralized manner is challenging. In this virtual observatory is a step towards this problem, however, it does not provide the data mining model. Adding the distributed data mining layer to the VO can be the solution in which the knowledge can be downloaded by the astronomers instead the raw data and thereafter astronomers can either reconstruct the data back from the downloaded…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.