CLARITY -- Comparing heterogeneous data using dissimiLARITY
Daniel J. Lawson, Vinesh Solanki, Igor Yanovich, Johannes Dellert,, Damian Ruck, Phillip Endicott

TL;DR
CLARITY is a non-parametric method that compares heterogeneous datasets by decomposing similarities into structural and relational components, aiding in identifying and interpreting inconsistencies across diverse scientific data.
Contribution
The paper introduces CLARITY, a novel non-parametric approach for comparing heterogeneous datasets by decomposing similarities into structural and relationship components.
Findings
Robust to noise and scaling differences
Applicable across diverse disciplines
Provides interpretable measures of dataset consistency
Abstract
Integrating datasets from different disciplines is hard because the data are often qualitatively different in meaning, scale, and reliability. When two datasets describe the same entities, many scientific questions can be phrased around whether the (dis)similarities between entities are conserved across such different data. Our method, CLARITY, quantifies consistency across datasets, identifies where inconsistencies arise, and aids in their interpretation. We illustrate this using three diverse comparisons: gene methylation vs expression, evolution of language sounds vs word use, and country-level economic metrics vs cultural beliefs. The non-parametric approach is robust to noise and differences in scaling, and makes only weak assumptions about how the data were generated. It operates by decomposing similarities into two components: a `structural' component analogous to a clustering,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOpinion Dynamics and Social Influence · Language and cultural evolution · Complex Network Analysis Techniques
