Verifying Classification with Limited Disclosure

Siddharth Bhandari; Liren Shan

arXiv:2502.16352·cs.LG·February 27, 2025

Verifying Classification with Limited Disclosure

Siddharth Bhandari, Liren Shan

PDF

Open Access

TL;DR

This paper develops verification protocols for multi-party classification that minimize disclosure of nonresponsive documents, introducing the Leave-One-Out dimension to quantify disclosure requirements and analyzing trade-offs based on classifier margin.

Contribution

It introduces the Leave-One-Out dimension for classifier verification, characterizes disclosure trade-offs for linear classifiers with margin, and extends protocols to nonrealizable and error-tolerant settings.

Findings

01

Verification protocols disclose at most the Leave-One-Out dimension of nonresponsive documents.

02

For linear classifiers, disclosure depends on the margin: constant, linear, or exponential in dimension.

03

Protocols are extended to nonrealizable cases and scenarios tolerant to misclassification errors.

Abstract

We consider the multi-party classification problem introduced by Dong, Hartline, and Vijayaraghavan (2022) motivated by electronic discovery. In this problem, our goal is to design a protocol that guarantees the requesting party receives nearly all responsive documents while minimizing the disclosure of nonresponsive documents. We develop verification protocols that certify the correctness of a classifier by disclosing a few nonresponsive documents. We introduce a combinatorial notion called the Leave-One-Out dimension of a family of classifiers and show that the number of nonresponsive documents disclosed by our protocol is at most this dimension in the realizable setting, where a perfect classifier exists in this family. For linear classifiers with a margin, we characterize the trade-off between the margin and the number of nonresponsive documents that must be disclosed for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques