Data Fusion by Matrix Factorization

Marinka \v{Z}itnik; Bla\v{z} Zupan

arXiv:1307.0803·cs.LG·February 9, 2015

Data Fusion by Matrix Factorization

Marinka \v{Z}itnik, Bla\v{z} Zupan

PDF

TL;DR

This paper introduces a penalized matrix tri-factorization method for data fusion that integrates multiple heterogeneous data sources to improve prediction accuracy in biological tasks.

Contribution

The paper presents a novel matrix factorization approach for data fusion that can handle diverse data types and demonstrates its effectiveness in gene function and pharmacologic action prediction.

Findings

01

Outperforms alternative data integration methods.

02

Achieves higher accuracy than single data sources.

03

Effective across multiple biological datasets.

Abstract

For most problems in science and engineering we can obtain data sets that describe the observed system from various perspectives and record the behavior of its individual components. Heterogeneous data sets can be collectively mined by data fusion. Fusion can focus on a specific target relation and exploit directly associated data together with contextual data and data about system's constraints. In the paper we describe a data fusion approach with penalized matrix tri-factorization (DFMF) that simultaneously factorizes data matrices to reveal hidden associations. The approach can directly consider any data that can be expressed in a matrix, including those from feature-based representations, ontologies, associations and networks. We demonstrate the utility of DFMF for gene function prediction task with eleven different data sources and for prediction of pharmacologic actions by fusing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.