Co-training for Demographic Classification Using Deep Learning from Label Proportions
Ehsan Mohammady Ardehaly, Aron Culotta

TL;DR
This paper introduces a deep learning method for demographic classification from unlabeled data using label proportions, with a novel regularization layer and a co-training algorithm that enhances accuracy in social media analysis.
Contribution
It presents a new deep neural network framework for learning from label proportions and a co-training algorithm for multi-view data, improving demographic classification without user-level annotations.
Findings
Deep LLP outperforms baselines on demographic classification tasks.
Co-training improves image and text classification F1 scores by 4% and 8%.
Ensemble of classifiers further boosts F1 by 4% on average.
Abstract
Deep learning algorithms have recently produced state-of-the-art accuracy in many classification tasks, but this success is typically dependent on access to many annotated training examples. For domains without such data, an attractive alternative is to train models with light, or distant supervision. In this paper, we introduce a deep neural network for the Learning from Label Proportion (LLP) setting, in which the training data consist of bags of unlabeled instances with associated label distributions for each bag. We introduce a new regularization layer, Batch Averager, that can be appended to the last layer of any deep neural network to convert it from supervised learning to LLP. This layer can be implemented readily with existing deep learning packages. To further support domains in which the data consist of two conditionally independent feature views (e.g. image and text), we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
