Learning Graphical Models from a Distributed Stream
Yu Zhang, Srikanta Tirthapura, Graham Cormode

TL;DR
This paper introduces a communication-efficient method for continuously learning and maintaining Bayesian network models over distributed streaming data, significantly reducing communication costs while maintaining accuracy.
Contribution
It presents a novel strategy for Bayesian network parameter maintenance that drastically reduces communication in distributed streaming environments.
Findings
Achieves exponential reduction in communication compared to baseline methods.
Maintains similar prediction accuracy for target distributions and classification tasks.
Supports scalable, real-time Bayesian network learning over distributed streams.
Abstract
A current challenge for data management systems is to support the construction and maintenance of machine learning models over data that is large, multi-dimensional, and evolving. While systems that could support these tasks are emerging, the need to scale to distributed, streaming data requires new models and algorithms. In this setting, as well as computational scalability and model accuracy, we also need to minimize the amount of communication between distributed processors, which is the chief component of latency. We study Bayesian networks, the workhorse of graphical models, and present a communication-efficient method for continuously learning and maintaining a Bayesian network model over data that is arriving as a distributed stream partitioned across multiple processors. We show a strategy for maintaining model parameters that leads to an exponential reduction in communication…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Data Stream Mining Techniques · Machine Learning and Data Classification
