A Study on Classification in Imbalanced and Partially-Labelled Data Streams
R. J. Lyon, J. M. Brooke, J. D. Knowles, B. W. Stappers

TL;DR
This paper explores the challenges of classifying imbalanced and partially labeled data streams in radio astronomy, specifically for pulsar detection with the SKA, highlighting current algorithm limitations and potential.
Contribution
It evaluates the feasibility of existing stream classification algorithms for astronomical data, revealing their limitations and potential for real-time pulsar signal detection.
Findings
Existing stream learners have low recall on real astronomical data.
Stream learners show good false positive rates and comparable accuracy to static models.
Potential for online classification solutions in big data astronomical applications.
Abstract
The domain of radio astronomy is currently facing significant computational challenges, foremost amongst which are those posed by the development of the world's largest radio telescope, the Square Kilometre Array (SKA). Preliminary specifications for this instrument suggest that the final design will incorporate between 2000 and 3000 individual 15 metre receiving dishes, which together can be expected to produce a data rate of many TB/s. Given such a high data rate, it becomes crucial to consider how this information will be processed and stored to maximise its scientific utility. In this paper, we consider one possible data processing scenario for the SKA, for the purposes of an all-sky pulsar survey. In particular we treat the selection of promising signals from the SKA processing pipeline as a data stream classification problem. We consider the feasibility of classifying signals that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
