Attesting Distributional Properties of Training Data for Machine   Learning

Vasisht Duddu; Anudeep Das; Nora Khayata; Hossein Yalame; Thomas; Schneider; N. Asokan

arXiv:2308.09552·cs.CR·April 10, 2024

Attesting Distributional Properties of Training Data for Machine Learning

Vasisht Duddu, Anudeep Das, Nora Khayata, Hossein Yalame, Thomas, Schneider, N. Asokan

PDF

Open Access 1 Repo

TL;DR

This paper introduces a method for verifying that training data for machine learning models has specific distributional properties, such as diversity, using cryptographic techniques to enable privacy-preserving attestation.

Contribution

It proposes a novel hybrid property attestation approach that combines property inference with cryptography to verify data distributional properties without data disclosure.

Findings

01

Effective hybrid property attestation demonstrated

02

Supports regulatory compliance for data diversity

03

Preserves data privacy during verification

Abstract

The success of machine learning (ML) has been accompanied by increased concerns about its trustworthiness. Several jurisdictions are preparing ML regulatory frameworks. One such concern is ensuring that model training data has desirable distributional properties for certain sensitive attributes. For example, draft regulations indicate that model trainers are required to show that training datasets have specific distributional properties, such as reflecting diversity of the population. We propose the notion of property attestation allowing a prover (e.g., model trainer) to demonstrate relevant distributional properties of training data to a verifier (e.g., a customer) without revealing the data. We present an effective hybrid property attestation combining property inference with cryptographic mechanisms.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ssg-research/distribution-attestation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning · Data Quality and Management