Comparison of Distance Metrics for Hierarchical Data in Medical Databases
Diman Hassan, Uwe Aickelin, Christian Wagner

TL;DR
This paper compares various distance metrics, including hierarchical and non-hierarchical types, applied to medical hierarchical data from the THIN database, evaluating their effectiveness for patient similarity and clustering.
Contribution
It provides a comparative analysis of distance metrics for hierarchical medical data, highlighting their suitability for patient similarity and clustering tasks.
Findings
Metrics perform differently based on data structure
All metrics can effectively discriminate patient groups
pq-gram metric is useful for hierarchical data comparison
Abstract
Distance metrics are broadly used in different research areas and applications, such as bio-informatics, data mining and many other fields. However, there are some metrics, like pq-gram and Edit Distance used specifically for data with a hierarchical structure. Other metrics used for non-hierarchical data are the geometric and Hamming metrics. We have applied these metrics to The Health Improvement Network (THIN) database which has some hierarchical data. The THIN data has to be converted into a tree-like structure for the first group of metrics. For the second group of metrics, the data are converted into a frequency table or matrix, then for all metrics, all distances are found and normalised. Based on this particular data set, our research question: which of these metrics is useful for THIN data? This paper compares the metrics, particularly the pq-gram metric on finding the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
