Analysis of multiple data sequences with different distributions:   defining common principal component axes by ergodic sequence generation and   multiple reweighting composition

Ikuo Fukuda; Kei Moritsugu

arXiv:2104.08141·stat.ME·April 19, 2021

Analysis of multiple data sequences with different distributions: defining common principal component axes by ergodic sequence generation and multiple reweighting composition

Ikuo Fukuda, Kei Moritsugu

PDF

Open Access

TL;DR

This paper introduces a method to define common principal component axes for multiple data sequences with different distributions by using ergodic sampling and reweighting techniques, enabling fair comparison across diverse datasets.

Contribution

It proposes a novel approach combining ergodic sequence generation and reweighting to find common PCA axes for multiple distributions, addressing a key challenge in multisequence analysis.

Findings

01

Effective common PC axes for diverse sequences

02

Accurate recovery of target distributions through reweighting

03

Enhanced comparison of multi-distribution data sets

Abstract

Principal component analysis (PCA) defines a reduced space described by PC axes for a given multidimensional-data sequence to capture the variations of the data. In practice, we need multiple data sequences that accurately obey individual probability distributions and for a fair comparison of the sequences we need PC axes that are common for the multiple sequences but properly capture these multiple distributions. For these requirements, we present individual ergodic samplings for these sequences and provide special reweighting for recovering the target distributions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGene expression and cancer classification · Machine Learning in Bioinformatics · Spectroscopy and Chemometric Analyses