Maximizing Diversity in (near-)Median String Selection
Diptarka Chakraborty, Rudrayan Kundu, Nidhi Purohit, Aravinda Kanchana Ruwanpathirana

TL;DR
This paper develops algorithms to find multiple diverse median strings under Hamming distance, enhancing robustness in data aggregation tasks like bioinformatics and pattern recognition.
Contribution
It introduces exact and approximation algorithms for generating diverse near-optimal median strings, utilizing structural insights and error-correcting code techniques.
Findings
Exact algorithm for diameter variant identifies maximally diverse median pairs.
$(1-\, ext{epsilon})$-approximation algorithm for sum dispersion.
Bi-criteria approximation for min dispersion enables multiple diverse medians.
Abstract
Given a set of strings over a specified alphabet, identifying a median or consensus string that minimizes the total distance to all input strings is a fundamental data aggregation problem. When the Hamming distance is considered as the underlying metric, this problem has extensive applications, ranging from bioinformatics to pattern recognition. However, modern applications often require the generation of multiple (near-)optimal yet diverse median strings to enhance flexibility and robustness in decision-making. In this study, we address this need by focusing on two prominent diversity measures: sum dispersion and min dispersion. We first introduce an exact algorithm for the diameter variant of the problem, which identifies pairs of near-optimal medians that are maximally diverse. Subsequently, we propose a -approximation algorithm (for any ) for sum…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Gene expression and cancer classification · Algorithms and Data Compression
