Scalable Solutions for Automated Single Pulse Identification and   Classification in Radio Astronomy

Thomas Devine; Katerina Goseva-Popstojanova; Di Pang

arXiv:1810.03190·astro-ph.IM·October 9, 2018·ICPP

Scalable Solutions for Automated Single Pulse Identification and Classification in Radio Astronomy

Thomas Devine, Katerina Goseva-Popstojanova, Di Pang

PDF

TL;DR

This paper presents a scalable, parallel approach for detecting and classifying radio pulsar signals in large datasets using Apache Spark and machine learning, significantly improving speed and efficiency.

Contribution

It introduces a scalable Spark-based detection algorithm and a novel multiclass machine learning technique with feature selection for faster pulsar classification.

Findings

01

Speedup of up to 5X in candidate identification

02

54% average speed improvement in machine learning classification

03

Less than 2% reduction in classification accuracy

Abstract

Data collection for scientific applications is increasing exponentially and is forecasted to soon reach peta- and exabyte scales. Applications which process and analyze scientific data must be scalable and focus on execution performance to keep pace. In the field of radio astronomy, in addition to increasingly large datasets, tasks such as the identification of transient radio signals from extrasolar sources are computationally expensive. We present a scalable approach to radio pulsar detection written in Scala that parallelizes candidate identification to take advantage of in-memory task processing using Apache Spark on a YARN distributed system. Furthermore, we introduce a novel automated multiclass supervised machine learning technique that we combine with feature selection to reduce the time required for candidate classification. Experimental testing on a Beowulf cluster with 15…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.