A General Framework for Association Analysis of Heterogeneous Data
Gen Li, Irina Gaynanova

TL;DR
This paper introduces a comprehensive framework for analyzing associations between high-dimensional, heterogeneous data types like continuous, binary, and count data, with applications in music annotation.
Contribution
It develops a novel modeling approach using exponential family distributions and structured decomposition to identify shared and unique patterns in heterogeneous data sets.
Findings
Effective analysis of CAL500 music data revealed meaningful acoustic-semantic relationships.
The framework enables automatic music annotation and retrieval.
The proposed methods outperform existing approaches in handling high-dimensional heterogeneous data.
Abstract
Multivariate association analysis is of primary interest in many applications. Despite the prevalence of high-dimensional and non-Gaussian data (such as count-valued or binary), most existing methods only apply to low-dimensional data with continuous measurements. Motivated by the Computer Audition Lab 500-song (CAL500) music annotation study, we develop a new framework for the association analysis of two sets of high-dimensional and heterogeneous (continuous/binary/count) data. We model heterogeneous random variables using exponential family distributions, and exploit a structured decomposition of the underlying natural parameter matrices to identify shared and individual patterns for two data sets. We also introduce a new measure of the strength of association, and a permutation-based procedure to test its significance. An alternating iteratively reweighted least squares algorithm is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Sensory Analysis and Statistical Methods
