Classification of Astrophysics Journal Articles with Machine Learning to   Identify Data for NED

Tracy X. Chen; Rick Ebert; Joseph M. Mazzarella; Cren Frayer; Scott; Terek; Ben H. P. Chan; David Cook; Tak Lo; Marion Schmitz; and Xiuqin Wu

arXiv:2201.03636·astro-ph.IM·January 17, 2022

Classification of Astrophysics Journal Articles with Machine Learning to Identify Data for NED

Tracy X. Chen, Rick Ebert, Joseph M. Mazzarella, Cren Frayer, Scott, Terek, Ben H. P. Chan, David Cook, Tak Lo, Marion Schmitz, and Xiuqin Wu

PDF

TL;DR

This paper presents a machine learning method to classify astrophysics journal articles by topic and data content, automating a traditionally manual process for the NED database with over 90% accuracy.

Contribution

The authors develop and implement a machine learning approach that automates the classification of astrophysics articles, significantly reducing manual effort and increasing efficiency.

Findings

01

ML classification achieves over 90% accuracy

02

Automates the process of selecting relevant astrophysics data

03

Reduces human review time substantially

Abstract

The NASA/IPAC Extragalactic Database (NED) is a comprehensive online service that combines fundamental multi-wavelength information for known objects beyond the Milky Way and provides value-added, derived quantities and tools to search and access the data. The contents and relationships between measurements in the database are continuously augmented and revised to stay current with astrophysics literature and new sky surveys. The conventional process of distilling and extracting data from the literature involves human experts to review the journal articles and determine if an article is of extragalactic nature, and if so, what types of data it contains. This is both labor intensive and unsustainable, especially given the ever-increasing number of publications each year. We present here a machine learning (ML) approach developed and integrated into the NED production pipeline to help…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.