Reducing the Long Tail Losses in Scientific Emulations with Active   Learning

Yi Heng Lim; Muhammad Firmansyah Kasim

arXiv:2111.08498·cs.LG·January 11, 2022

Reducing the Long Tail Losses in Scientific Emulations with Active Learning

Yi Heng Lim, Muhammad Firmansyah Kasim

PDF

Open Access 1 Repo

TL;DR

This paper introduces an active learning method using core-set selection and a warm start trick to reduce long tail losses in scientific emulation models, improving accuracy and efficiency.

Contribution

It presents a novel active learning approach with a warm start technique to effectively reduce long tail errors in scientific emulation tasks.

Findings

01

Achieved competitive performance with less labeled data.

02

Successfully reduced long tail losses in model training.

03

Demonstrated effectiveness across astrophysics and plasma physics case studies.

Abstract

Deep-learning-based models are increasingly used to emulate scientific simulations to accelerate scientific research. However, accurate, supervised deep learning models require huge amount of labelled data, and that often becomes the bottleneck in employing neural networks. In this work, we leveraged an active learning approach called core-set selection to actively select data, per a pre-defined budget, to be labelled for training. To further improve the model performance and reduce the training costs, we also warm started the training using a shrink-and-perturb trick. We tested on two case studies in different fields, namely galaxy halo occupation distribution modelling in astrophysics and x-ray emission spectroscopy in plasma physics, and the results are promising: we achieved competitive overall performance compared to using a random sampling baseline, and more importantly,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

machine-discovery/research
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Machine Learning in Materials Science