Mathematical Foundations of Data Cohesion

Katherine E. Moore

arXiv:2308.02546·cs.SI·August 8, 2023·1 cites

Mathematical Foundations of Data Cohesion

Katherine E. Moore

PDF

Open Access

TL;DR

This paper explores the mathematical properties of data cohesion, a measure inspired by social interactions, highlighting its role in data analysis, clustering, and outlier influence, with proofs of key properties and illustrative examples.

Contribution

It provides foundational results characterizing data cohesion, demonstrating its unique properties and potential applications in exploratory data analysis and human-aided computation.

Findings

01

Cohesion allows clustered sets to behave like single weighted points.

02

It complements metric-based dissimilarity measures and responds to local density.

03

Cohesion is uniquely defined by its average value and outlier influence proportionality.

Abstract

Data cohesion, a recently introduced measure inspired by social interactions, uses distance comparisons to assess relative proximity. In this work, we provide a collection of results which can guide the development of cohesion-based methods in exploratory data analysis and human-aided computation. Here, we observe the important role of highly clustered "point-like" sets and the ways in which cohesion allows such sets to take on qualities of a single weighted point. In doing so, we see how cohesion complements metric-adjacent measures of dissimilarity and responds to local density. We conclude by proving that cohesion is the unique function with (i) average value equal to one-half and (ii) the property that the influence of an outlier is proportional to its mass. Properties of cohesion are illustrated with examples throughout.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComplex Network Analysis Techniques · Advanced Clustering Algorithms Research · COVID-19 epidemiological studies