Structure of Classifier Boundaries: Case Study for a Naive Bayes Classifier
Alan F. Karr, Zac Bowen, Adam A. Porter, Regina Ruane

TL;DR
This paper investigates the complex boundary structure of a Naive Bayes classifier in a graph input space, introduces a new uncertainty measure called Neighbor Similarity, and applies it to DNA read classification.
Contribution
It provides a detailed analysis of classifier boundary structures and introduces Neighbor Similarity as a novel uncertainty measure applicable beyond Bayesian classifiers.
Findings
Boundary of the classifier is large and complex
Neighbor Similarity effectively tracks classifier uncertainty
Method applicable to classifiers without inherent uncertainty measures
Abstract
Classifiers assign complex input data points to one of a small number of output categories. For a Bayes classifier whose input space is a graph, we study the structure of the \emph{boundary}, which comprises those points for which at least one neighbor is classified differently. The scientific setting is assignment of DNA reads produced by \NGSs\ to candidate source genomes. The boundary is both large and complicated in structure. We introduce a new measure of uncertainty, Neighbor Similarity, that compares the result for an input point to the distribution of results for its neighbors. This measure not only tracks two inherent uncertainty measures for the Bayes classifier, but also can be implemented for classifiers without inherent measures of uncertainty.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Machine Learning and Algorithms · Machine Learning and Data Classification
