Towards Expectation-Maximization by SQL in RDBMS
Kangfei Zhao, Jeffrey Xu Yu, Yu Rong, Ming Liao, Junzhou Huang

TL;DR
This paper presents an SQL-based approach to implement Expectation-Maximization algorithms within relational database systems, enabling probabilistic modeling directly in RDBMSs for applications like clustering and data summarization.
Contribution
It introduces a novel SQL solution supporting EM algorithms in RDBMSs, including matrix representations, relational algebra operations, and a mechanism for automatic model maintenance.
Findings
Successful implementation of EM in SQL within RDBMSs
Demonstrated model maintenance when data changes
Experimental validation of the approach
Abstract
Integrating machine learning techniques into RDBMSs is an important task since there are many real applications that require modeling (e.g., business intelligence, strategic analysis) as well as querying data in RDBMSs. In this paper, we provide an SQL solution that has the potential to support different machine learning modelings. As an example, we study how to support unsupervised probabilistic modeling, that has a wide range of applications in clustering, density estimation and data summarization, and focus on Expectation-Maximization (EM) algorithms, which is a general technique for finding maximum likelihood estimators. To train a model by EM, it needs to update the model parameters by an E-step and an M-step in a while-loop iteratively until it converges to a level controled by some threshold or repeats a certain number of iterations. To support EM in RDBMSs, we show our answers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Time Series Analysis and Forecasting · Data Quality and Management
