Data-fusion using factor analysis and low-rank matrix completion

Daniel Ahfock; Saumyadipta Pyne; Geoffrey J. McLachlan

arXiv:2104.02888·stat.ME·April 8, 2021

Data-fusion using factor analysis and low-rank matrix completion

Daniel Ahfock, Saumyadipta Pyne, Geoffrey J. McLachlan

PDF

TL;DR

This paper introduces a novel approach combining factor analysis and low-rank matrix completion to improve data-fusion in statistical file-matching, with theoretical guarantees and practical advantages demonstrated on real datasets.

Contribution

It proves the identifiability of the factor analysis model in file-matching and develops an EM algorithm for effective covariance estimation from incomplete data.

Findings

01

Factor analysis-based method outperforms traditional low-rank completion in reconstruction error.

02

Theoretical conditions for model identifiability are established.

03

Empirical results on real datasets validate the approach's effectiveness.

Abstract

Data-fusion involves the integration of multiple related datasets. The statistical file-matching problem is a canonical data-fusion problem in multivariate analysis, where the objective is to characterise the joint distribution of a set of variables when only strict subsets of marginal distributions have been observed. Estimation of the covariance matrix of the full set of variables is challenging given the missing-data pattern. Factor analysis models use lower-dimensional latent variables in the data-generating process, and this introduces low-rank components in the complete-data matrix and the population covariance matrix. The low-rank structure of the factor analysis model can be exploited to estimate the full covariance matrix from incomplete data via low-rank matrix completion. We prove the identifiability of the factor analysis model in the statistical file-matching problem under…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.