Online Heterogeneous Mixture Learning for Big Data
Kazuki Seshimo, Ota Akira, Nishio Daichi, Yamane Satoshi

TL;DR
This paper introduces an online learning approach for big data analysis that handles heterogeneity, demonstrating rapid convergence to batch-level accuracy through experiments.
Contribution
It presents a novel online heterogeneous mixture learning method that achieves comparable accuracy to batch methods with faster convergence.
Findings
Online method converges quickly to batch accuracy.
Achieves comparable accuracy to traditional batch learning.
Effective for big data heterogeneity.
Abstract
We propose the online machine learning for big data analysis with heterogeneity. We performed an experiment to compare the accuracy of each iteration between batch one and online one. It is possible to converge quickly with the same accuracy as the batch one.
| the number of data | 10,000 |
|---|---|
| the number of component mixture | 4 |
| mixing coefficient | 0.1,0.2,0.3,0.4 |
| means, convariances | random |
| the number of dimensions | 10 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Bayesian Methods and Mixture Models · Data Stream Mining Techniques
Online Heterogeneous Mixture Learning
for Big Data
††thanks:
1st Kazuki Seshimo
Kanazawa University, Kanazawa, Japan
2nd Akira Ota
Kanazawa University, Kanazawa, Japan
3rd Daichi Nishio
Kanazawa University, Kanazawa, Japan
4th Satoshi Yamane
Kanazawa University, Kanazawa, Japan
Abstract
We propose the online machine learning for big data analysis with heterogeneity. We performed an experiment to compare the accuracy of each iteration between batch one and online one. It is possible to converge quickly with the same accuracy as the batch one.
I Introduction
There is a kind of heterogeneous mixture learning for big data analysis with heterogeneity. This is batch learning using a batch EM algorithm for model generation[1]. Therefore, we use the incremental EM algorithm [2,3,4] which is an online EM algorithm to propose online heterogeneous mixture learning. Online heterogeneous mixture learning is possible to converge faster than the batch type with the same accuracy.
II online heterogeneous mixed learning
We propose online learning of heterogeneous mixed learning using the online method of EM algorithm for mixture of Gaussian. First of all, we introduce the incremental EM algorithm[2,3].
We fix the parameters, and calculate the responsibility and the amount of change in the responsibility . we update the responsibility for one data with observation data .
[TABLE]
We calculate the amount of change in the responsibility .
[TABLE]
[TABLE]
We fix the esponsibility and the amount of change in the responsibility , and update each parameter.
[TABLE]
[TABLE]
[TABLE]
The crucial points of heterogeneous mixed learning are a factorized information criterion (FIC) and factorized asymptotic Bayesian inference (FAB)[1]. We have to make these available online. First, we improve FIC, which is metric of the model. Second, we improve FAB in response to change of FIC.
The which supports online learning is shown below.
[TABLE]
[TABLE]
It is not possible to evaluate directly because the parameters can not be determined analytically. In order to evaluate FIC, FAB maximizes an asymptotically-consistent lower bound of FIC. For updates incrementally, we improve FAB using the variation of the variational probability of the latent variable.
We calculate sequentially by repeating the following two steps times.
We optimize the distribution of latent variables , and calculate the distribution of latent variables and their variation for additional data .
We optimize components of mixture of Gaussian and parameters .
III results of experiment
We compare the results of conventional batch heterogeneous mixture learning [1] and online heterogeneous mixture learning which is proposed in this paper in the same environment and conditions.
In this experiment, the data used for learning is normal random number generated from the mixture of Gaussian. The mixture of Gaussian needs three parameters which are means, convariances and mixing coefficient. We specified these three parameters and the number of dimensions, and we made the dataset for this experiment. TABLE 1 show details.
We measured how the FIC changed with each iteration to compare the convergence speed of online learning with it of batch learning. The number of iterations until convergence was also included in the evaluation. The experiment is performed 10 times, and the average is taken as the experimental result. We experimented by changing the number of data [500, 10000] (Fig. 1) and changing the number of dimensions [2, 4, 20] (Fig. 2). We experimented with the other parameters fixed.
In terms of the small number of iterations, online learning is better than batch learning for all diagrams. When comparing online and batch algorithms, the batch algorithm usually converges faster. However, in the EM algorithm, the on-line algorithm converges faster than the batch algorithm.
IV CONCLUSION
We proposed the online heterogeneous mixture learning for the purpose of speeding up the convergence of machine learning for heterogeneous data. It can be learned to the same accuracy of the batch one with fewer iterations than the batch one. It is also necessary to consider using the Stepwise EM algorithm [4] in which the work area is scalable.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Ryohei Fujimaki, Satoshi Morinaga. ”Factorized Asymptotic Bayesian Inference for Mixture Modeling”. Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, PMLR 22:400-408, 2012.
- 2[2] GE Hinton RM Neal. ”A view of the em algorithm that justifies incremental”, sparse, and other variants. 1998.
- 3[3] Percy Liang,Dan Klein ”Online EM for unsupervised models” NAACL ’09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics Pages 611-619
- 4[4] Masa-Aki Sato and Shin Ishii. on-line em algorithm for the normalized gaussian network. Neural computation, Vol. 12, No. 2, pp. 407–432, 2000.
