Mining Massive Hierarchical Data Using a Scalable Probabilistic   Graphical Model

Khalifeh AlJadda; Mohammed Korayem; Camilo Ortiz; Trey Grainger; John; A. Miller; Khaled Rasheed; Krys J. Kochut; William S. York; Rene Ranzinger,; Melody Porterfield

arXiv:1512.08525·cs.AI·December 31, 2015

Mining Massive Hierarchical Data Using a Scalable Probabilistic Graphical Model

Khalifeh AlJadda, Mohammed Korayem, Camilo Ortiz, Trey Grainger, John, A. Miller, Khaled Rasheed, Krys J. Kochut, William S. York, Rene Ranzinger,, Melody Porterfield

PDF

TL;DR

This paper introduces a scalable extension to Bayesian Networks for modeling massive hierarchical data, enabling efficient analysis of large-scale, multi-level datasets with high accuracy in classification and semantic relationship prediction.

Contribution

The paper presents a novel scalable probabilistic graphical model extension specifically designed for large hierarchical data, overcoming the limitations of traditional Bayesian Networks in big data contexts.

Findings

01

Achieved perfect precision of 1.0 in multi-label classification of mass spectrometry data.

02

Predicted semantic relationships with up to 0.80 accuracy on 1.5 billion search logs.

03

Demonstrated scalability and effectiveness on large hierarchical datasets.

Abstract

Probabilistic Graphical Models (PGM) are very useful in the fields of machine learning and data mining. The crucial limitation of those models,however, is the scalability. The Bayesian Network, which is one of the most common PGMs used in machine learning and data mining, demonstrates this limitation when the training data consists of random variables, each of them has a large set of possible values. In the big data era, one would expect new extensions to the existing PGMs to handle the massive amount of data produced these days by computers, sensors and other electronic devices. With hierarchical data - data that is arranged in a treelike structure with several levels - one would expect to see hundreds of thousands or millions of values distributed over even just a small number of levels. When modeling this kind of hierarchical data across large data sets, Bayesian Networks become…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.