HUGO-CS: A Hybrid-Labeled, Uncertainty-Aware, General-Purpose, Observational Dataset for Cold Spray

Stephen Price; Kyle Miller; Marco Musto; Kenneth Kroenlein; James Saal; Kyle Tsaknopoulos; Elke A. Rundensteiner; Danielle L. Cote

arXiv:2605.04257·cs.LG·May 7, 2026

HUGO-CS: A Hybrid-Labeled, Uncertainty-Aware, General-Purpose, Observational Dataset for Cold Spray

Stephen Price, Kyle Miller, Marco Musto, Kenneth Kroenlein, James Saal, Kyle Tsaknopoulos, Elke A. Rundensteiner, Danielle L. Cote

PDF

1 Repo

TL;DR

HUGO-CS is a large, manually curated dataset of cold spray experiments derived from scientific literature, utilizing a hybrid LLM-based extraction framework with uncertainty-aware manual review, enabling better process modeling.

Contribution

This work introduces HUGO-CS, the largest cold spray dataset to date, and a novel hybrid extraction framework combining automation and manual validation for scientific literature data.

Findings

01

HUGO-CS contains 4,383 experiments, 30 times larger than previous datasets.

02

The hybrid extraction framework achieves high accuracy with reduced manual effort.

03

The dataset and code are publicly available for benchmarking and research.

Abstract

Cold spraying is an increasingly common approach for repairing and manufacturing components due to its solid-state manufacturing capabilities. However, process optimization remains difficult due to many interdependent parameters and the lack of large-scale, machine-readable data to support modeling. While the scientific literature contains many relevant experiments, results are inconsistently reported (often in tables and figures) and use non-uniform units, limiting utilization at scale. To address these limitations, this work presents HUGO-CS, a literature-derived dataset of 4,383 cold-spray experiments with 144 features from 1,124 sources, exceeding the previous largest dataset (137 samples) by 30x. With completely manual extraction requiring an average of 91 minutes per document, this work designs and leverages a Hybrid-labeled, Uncertainty-aware, General-purpose, Observational…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sprice134/HUGO
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.