Semblance: A Rank-Based Kernel on Probability Spaces for Niche Detection
Divyansh Agarwal, Nancy R. Zhang

TL;DR
Semblance introduces a distribution-free, rank-based kernel for measuring similarity in probability spaces, emphasizing outliers to detect niche features across diverse data types.
Contribution
It proposes Semblance, a novel kernel that enhances niche detection by focusing on distribution outskirts, validated as a Mercer kernel for kernel-based learning.
Findings
Consistently outperforms traditional similarity measures in simulations.
Effective in diverse applications like genomics, finance, and image processing.
Validates Semblance as a Mercer kernel for machine learning.
Abstract
In data science, determining proximity between observations is critical to many downstream analyses such as clustering, information retrieval and classification. However, when the underlying structure of the data probability space is unclear, the function used to compute similarity between data points is often arbitrarily chosen. Here, we present a novel concept of proximity, Semblance, that uses the empirical distribution across all observations to inform the similarity between each pair. The advantage of Semblance lies in its distribution free formulation and its ability to detect niche features by placing greater emphasis on similarity between observation pairs that fall at the outskirts of the data distribution, as opposed to those that fall towards the center. We prove that Semblance is a valid Mercer kernel, thus allowing its principled use in kernel based learning machines.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSingle-cell and spatial transcriptomics · Domain Adaptation and Few-Shot Learning · Hydrological Forecasting Using AI
