Global Multiclass Classification and Dataset Construction via Heterogeneous Local Experts
Surin Ahn, Ayfer Ozgur, Mert Pilanci

TL;DR
This paper introduces a novel method for aggregating heterogeneous local classifiers to perform multiclass classification efficiently, reducing the number of classifiers needed while maintaining high accuracy, demonstrated on MNIST and CIFAR-10 datasets.
Contribution
It models multiclass classification with local classifiers, derives bounds on classifier requirements, and proposes a near-optimal scheme inspired by set cover, improving dataset labeling processes.
Findings
Achieves high accuracy comparable to centralized classifiers.
Provides bounds on the number of classifiers needed under different assumptions.
Demonstrates effectiveness on MNIST and CIFAR-10 datasets.
Abstract
In the domains of dataset construction and crowdsourcing, a notable challenge is to aggregate labels from a heterogeneous set of labelers, each of whom is potentially an expert in some subset of tasks (and less reliable in others). To reduce costs of hiring human labelers or training automated labeling systems, it is of interest to minimize the number of labelers while ensuring the reliability of the resulting dataset. We model this as the problem of performing -class classification using the predictions of smaller classifiers, each trained on a subset of , and derive bounds on the number of classifiers needed to accurately infer the true class of an unlabeled sample under both adversarial and stochastic assumptions. By exploiting a connection to the classical set cover problem, we produce a near-optimal scheme for designing such configurations of classifiers which recovers the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
