Dependency detection with similarity constraints
Leo Lahti, Samuel Myllykangas, Sakari Knuutila, Samuel Kaski

TL;DR
This paper introduces constrained variants of canonical correlation analysis to improve dependency detection between paired datasets, especially in small sample scenarios, demonstrated through a cancer gene discovery application.
Contribution
It proposes similarity constraints on projections to reduce overfitting in dependency detection methods like CCA, enhancing performance in practical applications.
Findings
Similarity constraints improve detection of known cancer genes.
Constraints reduce overfitting in small sample dependency detection.
Application to gene discovery demonstrates practical benefits.
Abstract
Unsupervised two-view learning, or detection of dependencies between two paired data sets, is typically done by some variant of canonical correlation analysis (CCA). CCA searches for a linear projection for each view, such that the correlations between the projections are maximized. The solution is invariant to any linear transformation of either or both of the views; for tasks with small sample size such flexibility implies overfitting, which is even worse for more flexible nonparametric or kernel-based dependency discovery methods. We develop variants which reduce the degrees of freedom by assuming constraints on similarity of the projections in the two views. A particular example is provided by a cancer gene discovery application where chromosomal distance affects the dependencies between gene copy number and activity levels. Similarity constraints are shown to improve detection…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
