Toward Formal Data Set Verification for Building Effective Machine Learning Models
Jorge L\'opez, Maxime Labonne, Claude Poletti

TL;DR
This paper introduces a formal method for verifying data set properties crucial for effective machine learning model training, using logical transformations and automated solvers to ensure data quality and diversity.
Contribution
It presents a novel formal verification approach for data sets, transforming data into logic formulas and verifying properties with an automated solver.
Findings
Feasibility demonstrated through preliminary experiments.
The approach is flexible for various property specifications.
Prototype tool successfully verifies data set properties.
Abstract
In order to properly train a machine learning model, data must be properly collected. To guarantee a proper data collection, verifying that the collected data set holds certain properties is a possible solution. For example, guaranteeing that the data set contains samples across the whole input space, or that the data set is balanced w.r.t. different classes. We present a formal approach for verifying a set of arbitrarily stated properties over a data set. The proposed approach relies on the transformation of the data set into a first order logic formula, which can be later verified w.r.t. the different properties also stated in the same logic. A prototype tool, which uses the z3 solver, has been developed; the prototype can take as an input a set of properties stated in a formal language and formally verify a given data set w.r.t. to the given set of properties. Preliminary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Advanced Database Systems and Queries · Formal Methods in Verification
