Asymptotic Analysis of Generative Semi-Supervised Learning

Joshua V Dillon; Krishnakumar Balasubramanian; Guy Lebanon

arXiv:1003.0024·cs.LG·March 2, 2010·4 cites

Asymptotic Analysis of Generative Semi-Supervised Learning

Joshua V Dillon, Krishnakumar Balasubramanian, Guy Lebanon

PDF

Open Access

TL;DR

This paper provides an asymptotic analysis of generative semi-supervised learning, quantifying how labeling policies and data quantity affect model accuracy through a novel theoretical framework and empirical validation.

Contribution

It introduces an extension of stochastic composite likelihood to analyze the asymptotic accuracy of generative semi-supervised learning, addressing how to optimally allocate labeling efforts.

Findings

01

Quantifies the impact of labeling policies on accuracy.

02

Provides a framework to determine optimal data labeling strategies.

03

Validates findings with simulations and real-world NLP experiments.

Abstract

Semisupervised learning has emerged as a popular framework for improving modeling accuracy while controlling labeling cost. Based on an extension of stochastic composite likelihood we quantify the asymptotic accuracy of generative semi-supervised learning. In doing so, we complement distribution-free analysis by providing an alternative framework to measure the value associated with different labeling policies and resolve the fundamental question of how much data to label and in what manner. We demonstrate our approach with both simulation studies and real world experiments using naive Bayes for text classification and MRFs and CRFs for structured prediction in NLP.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Algorithms and Data Compression · Bayesian Methods and Mixture Models