Feature Selection Facilitates Learning Mixtures of Discrete Product   Distributions

Vincent Zhao; Steven W. Zucker

arXiv:1711.09195·stat.ML·November 28, 2017

Feature Selection Facilitates Learning Mixtures of Discrete Product Distributions

Vincent Zhao, Steven W. Zucker

PDF

Open Access

TL;DR

This paper introduces a feature selection algorithm based on mutual information to improve learning mixtures of discrete distributions, especially in crowdsourcing, by identifying and removing unreliable workers to enhance robustness.

Contribution

The paper proposes a novel mutual information-based feature selection method tailored for mixtures of discrete distributions, validated through empirical experiments.

Findings

01

Significant improvement in real data sets.

02

Effective ordering of workers based on mutual information.

03

Robustness enhancement in learning mixtures.

Abstract

Feature selection can facilitate the learning of mixtures of discrete random variables as they arise, e.g. in crowdsourcing tasks. Intuitively, not all workers are equally reliable but, if the less reliable ones could be eliminated, then learning should be more robust. By analogy with Gaussian mixture models, we seek a low-order statistical approach, and here introduce an algorithm based on the (pairwise) mutual information. This induces an order over workers that is well structured for the `one coin' model. More generally, it is justified by a goodness-of-fit measure and is validated empirically. Improvement in real data sets can be substantial.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Data Stream Mining Techniques · Bayesian Methods and Mixture Models