Structure of Classifier Boundaries: Case Study for a Naive Bayes Classifier

Alan F. Karr; Zac Bowen; Adam A. Porter; Regina Ruane

arXiv:2212.04382·stat.ML·December 24, 2025

Structure of Classifier Boundaries: Case Study for a Naive Bayes Classifier

Alan F. Karr, Zac Bowen, Adam A. Porter, Regina Ruane

PDF

Open Access

TL;DR

This paper investigates the complex boundary structure of a Naive Bayes classifier in a graph input space, introduces a new uncertainty measure called Neighbor Similarity, and applies it to DNA read classification.

Contribution

It provides a detailed analysis of classifier boundary structures and introduces Neighbor Similarity as a novel uncertainty measure applicable beyond Bayesian classifiers.

Findings

01

Boundary of the classifier is large and complex

02

Neighbor Similarity effectively tracks classifier uncertainty

03

Method applicable to classifiers without inherent uncertainty measures

Abstract

Classifiers assign complex input data points to one of a small number of output categories. For a Bayes classifier whose input space is a graph, we study the structure of the \emph{boundary}, which comprises those points for which at least one neighbor is classified differently. The scientific setting is assignment of DNA reads produced by \NGSs\ to candidate source genomes. The boundary is both large and complicated in structure. We introduce a new measure of uncertainty, Neighbor Similarity, that compares the result for an input point to the distribution of results for its neighbors. This measure not only tracks two inherent uncertainty measures for the Bayes classifier, but also can be implemented for classifiers without inherent measures of uncertainty.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Bioinformatics · Machine Learning and Algorithms · Machine Learning and Data Classification