Efficient Construction of Nonlinear Models over Normalized Data
Zhaoyue Chen, Nick Koudas, Zhe Zhang, Xiaohui Yu

TL;DR
This paper introduces efficient methods for training nonlinear machine learning models, specifically GMMs and neural networks, directly over normalized relational data, significantly reducing computation time without sacrificing accuracy.
Contribution
It presents novel algorithms for decomposing and factorizing GMM and neural network training over normalized data, enabling faster training compared to traditional approaches.
Findings
Training speed improved by up to 100% or more.
No loss in model accuracy with the proposed methods.
Performance gains increase with data complexity.
Abstract
Machine Learning (ML) applications are proliferating in the enterprise. Relational data which are prevalent in enterprise applications are typically normalized; as a result, data has to be denormalized via primary/foreign-key joins to be provided as input to ML algorithms. In this paper, we study the implementation of popular nonlinear ML models, Gaussian Mixture Models (GMM) and Neural Networks (NN), over normalized data addressing both cases of binary and multi-way joins over normalized relations. For the case of GMM, we show how it is possible to decompose computation in a systematic way both for binary joins and for multi-way joins to construct mixture models. We demonstrate that by factoring the computation, one can conduct the training of the models much faster compared to other applicable approaches, without any loss in accuracy. For the case of NN, we propose algorithms to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Bayesian Methods and Mixture Models · Neural Networks and Applications
