Verifying Classification with Limited Disclosure
Siddharth Bhandari, Liren Shan

TL;DR
This paper develops verification protocols for multi-party classification that minimize disclosure of nonresponsive documents, introducing the Leave-One-Out dimension to quantify disclosure requirements and analyzing trade-offs based on classifier margin.
Contribution
It introduces the Leave-One-Out dimension for classifier verification, characterizes disclosure trade-offs for linear classifiers with margin, and extends protocols to nonrealizable and error-tolerant settings.
Findings
Verification protocols disclose at most the Leave-One-Out dimension of nonresponsive documents.
For linear classifiers, disclosure depends on the margin: constant, linear, or exponential in dimension.
Protocols are extended to nonrealizable cases and scenarios tolerant to misclassification errors.
Abstract
We consider the multi-party classification problem introduced by Dong, Hartline, and Vijayaraghavan (2022) motivated by electronic discovery. In this problem, our goal is to design a protocol that guarantees the requesting party receives nearly all responsive documents while minimizing the disclosure of nonresponsive documents. We develop verification protocols that certify the correctness of a classifier by disclosing a few nonresponsive documents. We introduce a combinatorial notion called the Leave-One-Out dimension of a family of classifiers and show that the number of nonresponsive documents disclosed by our protocol is at most this dimension in the realizable setting, where a perfect classifier exists in this family. For linear classifiers with a margin, we characterize the trade-off between the margin and the number of nonresponsive documents that must be disclosed for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques
