A logical alarm for misaligned binary classifiers
Andr\'es Corrada-Emmanuel, Ilya Parker, Ramesh Bharadwaj

TL;DR
This paper introduces a logical framework for evaluating binary classifiers based on their agreements and disagreements, enabling detection of malfunctioning classifiers without labeled data, with implications for safe AI.
Contribution
It develops a set of axioms for assessing binary classifiers' consistency and constructs a logical alarm to identify malfunctioning agents using only unlabeled data.
Findings
A complete set of axioms for ensemble evaluation is established.
The logical alarm can detect at least one malfunctioning classifier without labeled data.
Connections to formal software verification and safe AI are discussed.
Abstract
If two agents disagree in their decisions, we may suspect they are not both correct. This intuition is formalized for evaluating agents that have carried out a binary classification task. Their agreements and disagreements on a joint test allow us to establish the only group evaluations logically consistent with their responses. This is done by establishing a set of axioms (algebraic relations) that must be universally obeyed by all evaluations of binary responders. A complete set of such axioms are possible for each ensemble of size N. The axioms for are used to construct a fully logical alarm - one that can prove that at least one ensemble member is malfunctioning using only unlabeled data. The similarities of this approach to formal software verification and its utility for recent agendas of safe guaranteed AI are discussed.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems · Neural Networks and Applications · Data Stream Mining Techniques
MethodsSparse Evolutionary Training
