Ensuring Dataset Quality for Machine Learning Certification

Sylvaine Picard; Camille Chapdelaine; Cyril Cappi; Laurent Gardes,; Eric Jenn; Baptiste Lef\`evre; Thomas Soumarmon

arXiv:2011.01799·cs.LG·November 4, 2020

Ensuring Dataset Quality for Machine Learning Certification

Sylvaine Picard, Camille Chapdelaine, Cyril Cappi, Laurent Gardes,, Eric Jenn, Baptiste Lef\`evre, Thomas Soumarmon

PDF

TL;DR

This paper proposes a dataset specification and verification process tailored for ML in safety-critical systems, addressing gaps in existing standards and providing practical recommendations for dataset management.

Contribution

It introduces a novel dataset specification and verification process specifically designed for ML safety-critical applications, filling a gap in current standards.

Findings

01

Applied the process to a railway signal recognition system

02

Provided a list of dataset collection and management recommendations

03

Contributed to the development of dataset engineering for safety-critical ML

Abstract

In this paper, we address the problem of dataset quality in the context of Machine Learning (ML)-based critical systems. We briefly analyse the applicability of some existing standards dealing with data and show that the specificities of the ML context are neither properly captured nor taken into ac-count. As a first answer to this concerning situation, we propose a dataset specification and verification process, and apply it on a signal recognition system from the railway domain. In addi-tion, we also give a list of recommendations for the collection and management of datasets. This work is one step towards the dataset engineering process that will be required for ML to be used on safety critical systems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.