Feature Selection Facilitates Learning Mixtures of Discrete Product Distributions
Vincent Zhao, Steven W. Zucker

TL;DR
This paper introduces a feature selection algorithm based on mutual information to improve learning mixtures of discrete distributions, especially in crowdsourcing, by identifying and removing unreliable workers to enhance robustness.
Contribution
The paper proposes a novel mutual information-based feature selection method tailored for mixtures of discrete distributions, validated through empirical experiments.
Findings
Significant improvement in real data sets.
Effective ordering of workers based on mutual information.
Robustness enhancement in learning mixtures.
Abstract
Feature selection can facilitate the learning of mixtures of discrete random variables as they arise, e.g. in crowdsourcing tasks. Intuitively, not all workers are equally reliable but, if the less reliable ones could be eliminated, then learning should be more robust. By analogy with Gaussian mixture models, we seek a low-order statistical approach, and here introduce an algorithm based on the (pairwise) mutual information. This induces an order over workers that is well structured for the `one coin' model. More generally, it is justified by a goodness-of-fit measure and is validated empirically. Improvement in real data sets can be substantial.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Data Stream Mining Techniques · Bayesian Methods and Mixture Models
