Quantifying the Uncertainty of Precision Estimates for Rule based Text   Classifiers

James Nutaro; Ozgur Ozmen

arXiv:2005.09198·cs.LG·May 20, 2020

Quantifying the Uncertainty of Precision Estimates for Rule based Text Classifiers

James Nutaro, Ozgur Ozmen

PDF

Open Access

TL;DR

This paper introduces a method for quantifying the uncertainty of precision estimates in rule-based text classifiers by modeling sub-string partitions as Bernoulli variables and applying statistical tests, with an extension to multi-label classification using Dempster-Shafer theory.

Contribution

It presents a novel approach to measure and compare the precision of rule-based classifiers and combines multiple classifiers into a multi-label system using evidence theory.

Findings

01

Effective quantification of classifier precision uncertainty.

02

Successful application to a benchmark problem.

03

Enhanced multi-label classification through evidence combination.

Abstract

Rule based classifiers that use the presence and absence of key sub-strings to make classification decisions have a natural mechanism for quantifying the uncertainty of their precision. For a binary classifier, the key insight is to treat partitions of the sub-string set induced by the documents as Bernoulli random variables. The mean value of each random variable is an estimate of the classifier's precision when presented with a document inducing that partition. These means can be compared, using standard statistical tests, to a desired or expected classifier precision. A set of binary classifiers can be combined into a single, multi-label classifier by an application of the Dempster-Shafer theory of evidence. The utility of this approach is demonstrated with a benchmark problem.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques · Machine Learning and Data Classification · Rough Sets and Fuzzy Logic