Scalable Teacher Forcing Network for Semi-Supervised Large Scale Data Streams
Mahardhika Pratama, Choiru Za'in, Edwin Lughofer, Eric Pardede, Dwi A., P. Rahayu

TL;DR
This paper introduces WeScatterNet, a scalable semi-supervised learning framework for large-scale data streams that effectively handles label scarcity and concept drift using distributed computing and data augmentation techniques.
Contribution
WeScatterNet is a novel distributed semi-supervised network that addresses large-scale data streams with limited labels and concept drift, integrating model fusion and data augmentation.
Findings
Achieves high performance with only 25% labels.
Outperforms fully supervised methods on large-scale data streams.
Effective handling of concept drift and data scarcity.
Abstract
The large-scale data stream problem refers to high-speed information flow which cannot be processed in scalable manner under a traditional computing platform. This problem also imposes expensive labelling cost making the deployment of fully supervised algorithms unfeasible. On the other hand, the problem of semi-supervised large-scale data streams is little explored in the literature because most works are designed in the traditional single-node computing environments while also being fully supervised approaches. This paper offers Weakly Supervised Scalable Teacher Forcing Network (WeScatterNet) to cope with the scarcity of labelled samples and the large-scale data streams simultaneously. WeScatterNet is crafted under distributed computing platform of Apache Spark with a data-free model fusion strategy for model compression after parallel computing stage. It features an open network…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Machine Learning and Data Classification · Advanced Control Systems Optimization
