A Framework for Cluster and Classifier Evaluation in the Absence of Reference Labels
Robert J. Joyce, Edward Raff, Charles Nicholas

TL;DR
This paper introduces an approximate ground truth refinement (AGTR) method to evaluate clustering and classification models without relying on high-quality reference labels, enabling bias detection and performance bounding in low-quality datasets.
Contribution
It proposes the AGTR framework that allows performance bounds and bias detection without reference labels, demonstrated in malware classification tasks.
Findings
Bounds on evaluation metrics can be computed without reference labels.
AGTR can identify inaccurate evaluation results in dubious datasets.
Application to malware classification revealed over-fitting and unquantified impacts.
Abstract
In some problem spaces, the high cost of obtaining ground truth labels necessitates use of lower quality reference datasets. It is difficult to benchmark model performance using these datasets, as evaluation results may be biased. We propose a supplement to using reference labels, which we call an approximate ground truth refinement (AGTR). Using an AGTR, we prove that bounds on specific metrics used to evaluate clustering algorithms and multi-class classifiers can be computed without reference labels. We also introduce a procedure that uses an AGTR to identify inaccurate evaluation results produced from datasets of dubious quality. Creating an AGTR requires domain knowledge, and malware family classification is a task with robust domain knowledge approaches that support the construction of an AGTR. We demonstrate our AGTR evaluation framework by applying it to a popular malware…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
