Information Splitting for Big Data Analytics

Shengxin Zhu; Tongxiang Gu; Xiaowen Xu; Zeyao Mo

arXiv:1607.03390·stat.CO·September 5, 2016·CyberC·2 cites

Information Splitting for Big Data Analytics

Shengxin Zhu, Tongxiang Gu, Xiaowen Xu, Zeyao Mo

PDF

Open Access

TL;DR

This paper introduces an information splitting technique that simplifies the computation of the Hessian matrix in large-scale statistical models, enabling efficient analysis of big data in fields like genetics and social networks.

Contribution

It proposes a novel information splitting method to approximate the Hessian, reducing computational complexity for big data statistical models.

Findings

01

Significantly reduces computation time for large datasets.

02

Enables application of linear mixed models to big data.

03

Applicable to genetics and social network analysis.

Abstract

Many statistical models require an estimation of unknown (co)-variance parameter(s) in a model. The estimation usually obtained by maximizing a log-likelihood which involves log determinant terms. In principle, one requires the \emph{observed information}--the negative Hessian matrix or the second derivative of the log-likelihood---to obtain an accurate maximum likelihood estimator according to the Newton method. When one uses the \emph{Fisher information}, the expect value of the observed information, a simpler algorithm than the Newton method is obtained as the Fisher scoring algorithm. With the advance in high-throughput technologies in the biological sciences, recommendation systems and social networks, the sizes of data sets---and the corresponding statistical models---have suddenly increased by several orders of magnitude. Neither the observed information nor the Fisher…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Bioinformatics and Genomic Networks · Complex Network Analysis Techniques