Analysis of High-dimensional Gaussian Labeled-unlabeled Mixture Model   via Message-passing Algorithm

Xiaosi Gu; Tomoyuki Obuchi

arXiv:2411.19553·cs.LG·March 14, 2025

Analysis of High-dimensional Gaussian Labeled-unlabeled Mixture Model via Message-passing Algorithm

Xiaosi Gu, Tomoyuki Obuchi

PDF

Open Access

TL;DR

This paper analyzes the properties of high-dimensional Gaussian mixture models in semi-supervised learning using message-passing algorithms, revealing how regularization impacts estimation and prediction accuracy.

Contribution

It provides a detailed theoretical analysis of GMM in SSL, comparing Bayesian and regularized maximum likelihood methods using state evolution techniques.

Findings

01

RMLE achieves near-optimal performance with sufficient unlabeled data.

02

Regularization significantly improves estimation and prediction accuracy.

03

Bayes-optimal estimator serves as a benchmark for performance evaluation.

Abstract

Semi-supervised learning (SSL) is a machine learning methodology that leverages unlabeled data in conjunction with a limited amount of labeled data. Although SSL has been applied in various applications and its effectiveness has been empirically demonstrated, it is still not fully understood when and why SSL performs well. Some existing theoretical studies have attempted to address this issue by modeling classification problems using the so-called Gaussian Mixture Model (GMM). These studies provide notable and insightful interpretations. However, their analyses are focused on specific purposes, and a thorough investigation of the properties of GMM in the context of SSL has been lacking. In this paper, we conduct such a detailed analysis of the properties of the high-dimensional GMM for binary classification in the SSL setting. To this end, we employ the approximate message passing and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models