Optimistic Verifiable Training by Controlling Hardware Nondeterminism

Megha Srivastava; Simran Arora; Dan Boneh

arXiv:2403.09603·cs.CR·November 26, 2024·2 cites

Optimistic Verifiable Training by Controlling Hardware Nondeterminism

Megha Srivastava, Simran Arora, Dan Boneh

PDF

Open Access 1 Repo

TL;DR

This paper introduces a verifiable training method that controls hardware nondeterminism to enable exact replication of AI training processes across different GPUs, enhancing security and efficiency.

Contribution

It presents a novel approach combining higher precision training, rounding, and adaptive thresholding to ensure reproducibility and verifiability across diverse hardware.

Findings

01

Achieves exact FP32 training replication on multiple NVIDIA GPUs.

02

Reduces storage and time costs compared to proof-based verifiable training.

03

Successfully applies to full training and fine-tuning of large models.

Abstract

The increasing compute demands of AI systems have led to the emergence of services that train models on behalf of clients lacking necessary resources. However, ensuring correctness of training and guarding against potential training-time attacks, such as data poisoning and backdoors, poses challenges. Existing works on verifiable training largely fall into two classes: proof-based systems, which are difficult to scale, and ``optimistic'' methods that consider a third-party auditor who can replicate the training process and contest the trainer. A key challenge with the latter is that nondeterminism between GPU types during training prevents exact replication of the training process, resulting in schemes that are non-robust. We propose a method that combines training in a higher precision than the target, rounding after intermediate computations, and sharing rounding decisions based on an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

meghabyte/verifiable-training
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Neural Networks and Applications

MethodsAttention Is All You Need · Residual Connection · Weight Decay · Discriminative Fine-Tuning · Dropout · Softmax · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Dense Connections · Adam