Quantifying the Uncertainty of Precision Estimates for Rule based Text Classifiers
James Nutaro, Ozgur Ozmen

TL;DR
This paper introduces a method for quantifying the uncertainty of precision estimates in rule-based text classifiers by modeling sub-string partitions as Bernoulli variables and applying statistical tests, with an extension to multi-label classification using Dempster-Shafer theory.
Contribution
It presents a novel approach to measure and compare the precision of rule-based classifiers and combines multiple classifiers into a multi-label system using evidence theory.
Findings
Effective quantification of classifier precision uncertainty.
Successful application to a benchmark problem.
Enhanced multi-label classification through evidence combination.
Abstract
Rule based classifiers that use the presence and absence of key sub-strings to make classification decisions have a natural mechanism for quantifying the uncertainty of their precision. For a binary classifier, the key insight is to treat partitions of the sub-string set induced by the documents as Bernoulli random variables. The mean value of each random variable is an estimate of the classifier's precision when presented with a document inducing that partition. These means can be compared, using standard statistical tests, to a desired or expected classifier precision. A set of binary classifiers can be combined into a single, multi-label classifier by an application of the Dempster-Shafer theory of evidence. The utility of this approach is demonstrated with a benchmark problem.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Machine Learning and Data Classification · Rough Sets and Fuzzy Logic
