Efficient Construction of Nonlinear Models over Normalized Data

Zhaoyue Chen; Nick Koudas; Zhe Zhang; Xiaohui Yu

arXiv:2011.11682·cs.LG·March 22, 2021

Efficient Construction of Nonlinear Models over Normalized Data

Zhaoyue Chen, Nick Koudas, Zhe Zhang, Xiaohui Yu

PDF

Open Access

TL;DR

This paper introduces efficient methods for training nonlinear machine learning models, specifically GMMs and neural networks, directly over normalized relational data, significantly reducing computation time without sacrificing accuracy.

Contribution

It presents novel algorithms for decomposing and factorizing GMM and neural network training over normalized data, enabling faster training compared to traditional approaches.

Findings

01

Training speed improved by up to 100% or more.

02

No loss in model accuracy with the proposed methods.

03

Performance gains increase with data complexity.

Abstract

Machine Learning (ML) applications are proliferating in the enterprise. Relational data which are prevalent in enterprise applications are typically normalized; as a result, data has to be denormalized via primary/foreign-key joins to be provided as input to ML algorithms. In this paper, we study the implementation of popular nonlinear ML models, Gaussian Mixture Models (GMM) and Neural Networks (NN), over normalized data addressing both cases of binary and multi-way joins over normalized relations. For the case of GMM, we show how it is possible to decompose computation in a systematic way both for binary joins and for multi-way joins to construct mixture models. We demonstrate that by factoring the computation, one can conduct the training of the models much faster compared to other applicable approaches, without any loss in accuracy. For the case of NN, we propose algorithms to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Bayesian Methods and Mixture Models · Neural Networks and Applications