Spoofing Generalization: When Can't You Trust Proprietary Models?

Ankur Moitra; Elchanan Mossel; Colin Sandon

arXiv:2106.08393·cs.LG·March 25, 2022

Spoofing Generalization: When Can't You Trust Proprietary Models?

Ankur Moitra, Elchanan Mossel, Colin Sandon

PDF

Open Access

TL;DR

This paper explores the computational difficulty of distinguishing between genuinely accurate models and maliciously spoofed models that perfectly fit training data but do not generalize, highlighting trust issues in proprietary models.

Contribution

It introduces the concepts of strong and weak spoofing, demonstrating their feasibility under cryptographic assumptions and unconditionally, revealing fundamental challenges in trusting proprietary machine learning models.

Findings

01

Strong spoofing is possible under cryptographic assumptions.

02

Weak spoofing can be achieved unconditionally for any polynomial time bound.

03

Highlights inherent trust issues in proprietary models.

Abstract

In this work, we study the computational complexity of determining whether a machine learning model that perfectly fits the training data will generalizes to unseen data. In particular, we study the power of a malicious agent whose goal is to construct a model g that fits its training data and nothing else, but is indistinguishable from an accurate model f. We say that g strongly spoofs f if no polynomial-time algorithm can tell them apart. If instead we restrict to algorithms that run in $n^{c}$ time for some fixed $c$ , we say that g c-weakly spoofs f. Our main results are 1. Under cryptographic assumptions, strong spoofing is possible and 2. For any c> 0, c-weak spoofing is possible unconditionally While the assumption of a malicious agent is an extreme scenario (hopefully companies training large models are not malicious), we believe that it sheds light on the inherent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputability, Logic, AI Algorithms · Cryptography and Data Security · Machine Learning and Algorithms