Optimistic Verifiable Training by Controlling Hardware Nondeterminism
Megha Srivastava, Simran Arora, Dan Boneh

TL;DR
This paper introduces a verifiable training method that controls hardware nondeterminism to enable exact replication of AI training processes across different GPUs, enhancing security and efficiency.
Contribution
It presents a novel approach combining higher precision training, rounding, and adaptive thresholding to ensure reproducibility and verifiability across diverse hardware.
Findings
Achieves exact FP32 training replication on multiple NVIDIA GPUs.
Reduces storage and time costs compared to proof-based verifiable training.
Successfully applies to full training and fine-tuning of large models.
Abstract
The increasing compute demands of AI systems have led to the emergence of services that train models on behalf of clients lacking necessary resources. However, ensuring correctness of training and guarding against potential training-time attacks, such as data poisoning and backdoors, poses challenges. Existing works on verifiable training largely fall into two classes: proof-based systems, which are difficult to scale, and ``optimistic'' methods that consider a third-party auditor who can replicate the training process and contest the trainer. A key challenge with the latter is that nondeterminism between GPU types during training prevents exact replication of the training process, resulting in schemes that are non-robust. We propose a method that combines training in a higher precision than the target, rounding after intermediate computations, and sharing rounding decisions based on an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Neural Networks and Applications
MethodsAttention Is All You Need · Residual Connection · Weight Decay · Discriminative Fine-Tuning · Dropout · Softmax · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Dense Connections · Adam
