# Modeling with the Crowd: Optimizing the Human-Machine Partnership with   Zooniverse

**Authors:** Hugh Dickinson, Lucy Fortson, Claudia Scarlata, Melanie Beck, Mike, Walmsley

arXiv: 1903.07776 · 2020-06-17

## TL;DR

This paper presents Galaxy Zoo Express, a Bayesian crowd-sourcing approach that rapidly generates reliable labeled datasets for training deep learning models to analyze large-scale astronomical survey data.

## Contribution

It introduces a real-time Bayesian aggregation method for citizen scientist responses that enables online training of machine classifiers for astronomical image analysis.

## Key findings

- Bayesian aggregation effectively combines crowd responses for reliable labels.
- The approach enables real-time object localization and bounding-box annotation.
- Trained DL models can improve morphological classification and spectral analysis.

## Abstract

LSST and Euclid must address the daunting challenge of analyzing the unprecedented volumes of imaging and spectroscopic data that these next-generation instruments will generate. A promising approach to overcoming this challenge involves rapid, automatic image processing using appropriately trained Deep Learning (DL) algorithms. However, reliable application of DL requires large, accurately labeled samples of training data. Galaxy Zoo Express (GZX) is a recent experiment that simulated using Bayesian inference to dynamically aggregate binary responses provided by citizen scientists via the Zooniverse crowd-sourcing platform in real time. The GZX approach enables collaboration between human and machine classifiers and provides rapidly generated, reliably labeled datasets, thereby enabling online training of accurate machine classifiers. We present selected results from GZX and show how the Bayesian aggregation engine it uses can be extended to efficiently provide object-localization and bounding-box annotations of two-dimensional data with quantified reliability. DL algorithms that are trained using these annotations will facilitate numerous panchromatic data modeling tasks including morphological classification and substructure detection in direct imaging, as well as decontamination and emission line identification for slitless spectroscopy. Effectively combining the speed of modern computational analyses with the human capacity to extrapolate from few examples will be critical if the potential of forthcoming large-scale surveys is to be realized.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.07776/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1903.07776/full.md

## References

13 references — full list in the complete paper: https://tomesphere.com/paper/1903.07776/full.md

---
Source: https://tomesphere.com/paper/1903.07776