Semi-Supervised Learning under General Causal Models

Archer Moore; Heejung Shim; Jingge Zhu; Mingming Gong

arXiv:2510.22567·stat.ML·October 28, 2025

Semi-Supervised Learning under General Causal Models

Archer Moore, Heejung Shim, Jingge Zhu, Mingming Gong

PDF

TL;DR

This paper introduces a semi-supervised learning framework based on general causal models, leveraging unlabelled data to learn causal structures and generate synthetic labelled data, thereby improving prediction accuracy.

Contribution

It proposes a novel SSL approach that models complex causal relations and uses unlabelled data to learn causal generative models for synthetic data generation.

Findings

01

Effective on simulated data

02

Improves prediction accuracy

03

Validates on real datasets

Abstract

Semi-supervised learning (SSL) aims to train a machine learning model using both labelled and unlabelled data. While the unlabelled data have been used in various ways to improve the prediction accuracy, the reason why unlabelled data could help is not fully understood. One interesting and promising direction is to understand SSL from a causal perspective. In light of the independent causal mechanisms principle, the unlabelled data can be helpful when the label causes the features but not vice versa. However, the causal relations between the features and labels can be complex in real world applications. In this paper, we propose a SSL framework that works with general causal models in which the variables have flexible causal relations. More specifically, we explore the causal graph structures and design corresponding causal generative models which can be learned with the help of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.