Attesting Distributional Properties of Training Data for Machine Learning
Vasisht Duddu, Anudeep Das, Nora Khayata, Hossein Yalame, Thomas, Schneider, N. Asokan

TL;DR
This paper introduces a method for verifying that training data for machine learning models has specific distributional properties, such as diversity, using cryptographic techniques to enable privacy-preserving attestation.
Contribution
It proposes a novel hybrid property attestation approach that combines property inference with cryptography to verify data distributional properties without data disclosure.
Findings
Effective hybrid property attestation demonstrated
Supports regulatory compliance for data diversity
Preserves data privacy during verification
Abstract
The success of machine learning (ML) has been accompanied by increased concerns about its trustworthiness. Several jurisdictions are preparing ML regulatory frameworks. One such concern is ensuring that model training data has desirable distributional properties for certain sensitive attributes. For example, draft regulations indicate that model trainers are required to show that training datasets have specific distributional properties, such as reflecting diversity of the population. We propose the notion of property attestation allowing a prover (e.g., model trainer) to demonstrate relevant distributional properties of training data to a verifier (e.g., a customer) without revealing the data. We present an effective hybrid property attestation combining property inference with cryptographic mechanisms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning · Data Quality and Management
