TL;DR
This paper presents a data valuation framework using Shapley values to improve the selection of training data in isotope-based geographic origin verification, enhancing model robustness and accuracy in supply chain monitoring.
Contribution
It introduces a novel data valuation method that guides strategic sampling in isotope analysis models, deployed in a real-world provenance verification system.
Findings
Improved model robustness and accuracy in provenance verification.
Effective prioritization of high-informative samples for training.
Enhanced regulatory enforcement and fraud mitigation.
Abstract
Determining and verifying product provenance remains a critical challenge in global supply chains, particularly as geopolitical conflicts and shifting borders create new incentives for misrepresentation of commodities, such as hiding the origin of illegally harvested timber or stolen agricultural products. Stable Isotope Ratio Analysis (SIRA), combined with Gaussian process regression-based isoscapes, has emerged as a powerful tool for geographic origin verification. While these models are now actively deployed in operational settings supporting regulators, certification bodies, and companies, they remain constrained by data scarcity and suboptimal dataset selection. In this work, we introduce a novel deployed data valuation framework designed to enhance the selection and utilization of training data for machine learning models applied in SIRA. By quantifying the marginal utility of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
MethodsGaussian Process
