Similarity Downselection: A Python implementation of a heuristic search algorithm for finding the set of the n most dissimilar items with an application in conformer sampling
Felicity F. Nielson, Sean M. Colby, Ryan S. Renslow, Thomas O. Metz

TL;DR
This paper introduces an open-source Python heuristic algorithm, similarity downselection (SDS), that efficiently approximates the most dissimilar item set in large populations, outperforming Monte Carlo methods in speed and accuracy.
Contribution
The paper presents SDS, a novel heuristic algorithm for fast approximation of the most dissimilar item set, with demonstrated superior performance over Monte Carlo methods.
Findings
SDS is significantly faster than Monte Carlo methods.
SDS achieves higher accuracy than Monte Carlo with fewer iterations.
SDS produces solutions close to the exact in small population benchmarks.
Abstract
Finding the set of the n items most dissimilar from each other out of a larger population becomes increasingly difficult and computationally expensive as either n or the population size grows large. Finding the set of the n most dissimilar items is different than simply sorting an array of numbers because there exists a pairwise relationship between each item and all other items in the population. For instance, if you have a set of the most dissimilar n=4 items, one or more of the items from n=4 might not be in the set n=5. An exact solution would have to search all possible combinations of size n in the population, exhaustively. We present an open-source software called similarity downselection (SDS), written in Python and freely available on GitHub. SDS implements a heuristic algorithm for quickly finding the approximate set(s) of the n most dissimilar items. We benchmark SDS against…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Advanced Statistical Process Monitoring · Statistical Methods and Inference
