On a Subset Metric

Richard Castro; Zhibin Chang; Ethan Ha; Evan Hall; Hiren Maharaj

arXiv:2302.13433·math.MG·February 28, 2023

On a Subset Metric

Richard Castro, Zhibin Chang, Ethan Ha, Evan Hall, Hiren Maharaj

PDF

Open Access

TL;DR

This paper introduces a new metric on finite subsets of a bounded metric space, extending previous subset distance concepts to facilitate error correction in DNA data storage and related applications.

Contribution

It generalizes existing subset metrics, providing a new mathematical framework for analyzing error correction in subset-based data representations.

Findings

01

Defines a new metric on finite subsets of a bounded metric space

02

Extends the sequence-subset distance used in DNA data storage

03

Builds on previous work by Eiter and Mannila on subset distance functions

Abstract

For a bounded metric space X, we define a metric on the set of all finite subsets of X. This generalizes the sequence-subset distance introduced by Wentu Song, Kui Cai and Kees A. Schouhamer Immink to study error correcting codes for DNA based data storage. This work also complements the work of Eiter and Mannila where they study extensions of distance functions to subsets of a space in the context of various applications.

Equations77

d_{χ} (X_{1}, X_{2}) = x \in X_{1} \sum d_{H} (x, χ (x)) + L (∣ X_{2} ∣ - ∣ X_{1} ∣) .

d_{χ} (X_{1}, X_{2}) = x \in X_{1} \sum d_{H} (x, χ (x)) + L (∣ X_{2} ∣ - ∣ X_{1} ∣) .

d_{S} (X_{1}, X_{2}) = d_{S} (X_{2}, X_{1}) = min {d_{χ} (X_{1}, X_{2}) ∣ χ : X_{1} \to X_{2} is an injection} .

d_{S} (X_{1}, X_{2}) = d_{S} (X_{2}, X_{1}) = min {d_{χ} (X_{1}, X_{2}) ∣ χ : X_{1} \to X_{2} is an injection} .

d (x, y) \leq M (x) \leq d (x, z) + M (z)

d (x, y) \leq M (x) \leq d (x, z) + M (z)

d_{χ} (A, B) := x \in A \sum d (x, χ (x)) + y \in B ∖ χ (A) \sum M (y) .

d_{χ} (A, B) := x \in A \sum d (x, χ (x)) + y \in B ∖ χ (A) \sum M (y) .

d_{S} (A, B) = d_{S} (B, A) := min {d_{χ} (A, B) ∣ χ : A \to B is an injection} .

d_{S} (A, B) = d_{S} (B, A) := min {d_{χ} (A, B) ∣ χ : A \to B is an injection} .

M (x) = sup {d (x, y) : y \in X} .

M (x) = sup {d (x, y) : y \in X} .

h (A, B) := max {a \in A max d (a, B), b \in B max d (b, A)}

h (A, B) := max {a \in A max d (a, B), b \in B max d (b, A)}

d_{m d} (S_{1}, S_{2}) := \frac{1}{2} (e \in S_{1} \sum Δ (e, S_{2}) + e \in S_{2} \sum Δ (e, S_{1})),

d_{m d} (S_{1}, S_{2}) := \frac{1}{2} (e \in S_{1} \sum Δ (e, S_{2}) + e \in S_{2} \sum Δ (e, S_{1})),

d_{s} (S_{1}, S_{2}) := η min (e_{1}, e_{2}) \in η \sum Δ (e_{1}, e_{2})

d_{s} (S_{1}, S_{2}) := η min (e_{1}, e_{2}) \in η \sum Δ (e_{1}, e_{2})

d_{f s} (S_{1}, S_{2}) := η min (e_{1}, e_{2}) \in η \sum Δ (e_{1}, e_{2})

d_{f s} (S_{1}, S_{2}) := η min (e_{1}, e_{2}) \in η \sum Δ (e_{1}, e_{2})

d_{l} (S_{1}, S_{2}) := R min (e_{1}, e_{2}) \in R \sum Δ (e_{1}, e_{2})

d_{l} (S_{1}, S_{2}) := R min (e_{1}, e_{2}) \in R \sum Δ (e_{1}, e_{2})

\nu(x)=\left\{\begin{array}[]{ll}\chi(x)&\hbox{ if }x\neq x_{0}\\ x_{0}&\hbox{ if }x=x_{0}.\end{array}\right.

\nu(x)=\left\{\begin{array}[]{ll}\chi(x)&\hbox{ if }x\neq x_{0}\\ x_{0}&\hbox{ if }x=x_{0}.\end{array}\right.

x \in X_{1} \sum d (x, ν (x)) = x \in X_{1} \sum d (x, χ (x)) - d (x_{0}, χ (x_{0})) .

x \in X_{1} \sum d (x, ν (x)) = x \in X_{1} \sum d (x, χ (x)) - d (x_{0}, χ (x_{0})) .

y \in X_{2} ∖ ν (X_{1}) \sum M (y) = y \in X_{2} ∖ χ (X_{1}) \sum M (y) - M (x_{0}) + M (χ (x_{0})) .

y \in X_{2} ∖ ν (X_{1}) \sum M (y) = y \in X_{2} ∖ χ (X_{1}) \sum M (y) - M (x_{0}) + M (χ (x_{0})) .

d_{ν} (X_{1}, X_{2}) = d_{χ} (X_{1}, X_{2}) + M (χ (x_{0})) - M (x_{0}) - d (x_{0}, χ (x_{0})) .

d_{ν} (X_{1}, X_{2}) = d_{χ} (X_{1}, X_{2}) + M (χ (x_{0})) - M (x_{0}) - d (x_{0}, χ (x_{0})) .

\mu(x)=\left\{\begin{array}[]{ll}\chi(x)&\hbox{ if }x\neq x_{1},z\\ x_{1}&\hbox{ if }x=x_{1}\\ y&\hbox{ if }x=z.\end{array}\right.

\mu(x)=\left\{\begin{array}[]{ll}\chi(x)&\hbox{ if }x\neq x_{1},z\\ x_{1}&\hbox{ if }x=x_{1}\\ y&\hbox{ if }x=z.\end{array}\right.

d_{χ} (X_{1}, X_{2})

d_{χ} (X_{1}, X_{2})

d_{S} (X_{1}, X_{2}) = d_{S} (X_{1} ∖ X_{2}, X_{2} ∖ X_{1}) .

d_{S} (X_{1}, X_{2}) = d_{S} (X_{1} ∖ X_{2}, X_{2} ∖ X_{1}) .

\eta(x)=\left\{\begin{array}[]{ll}\chi(x)&\hbox{if }x\neq a\\ c&\hbox{if }x=a.\end{array}\right.

\eta(x)=\left\{\begin{array}[]{ll}\chi(x)&\hbox{if }x\neq a\\ c&\hbox{if }x=a.\end{array}\right.

d_{S} (X_{1}, X_{2} \cup {b})

d_{S} (X_{1}, X_{2} \cup {b})

d_{S} (X_{1}, X_{2}^{'}) \leq d_{S} (X_{1}, X_{2}) .

d_{S} (X_{1}, X_{2}^{'}) \leq d_{S} (X_{1}, X_{2}) .

X_{1} =

X_{1} =

X_{3} =

X_{2} =

d_{S} (X_{1}, X_{3}) =

d_{S} (X_{1}, X_{3}) =

d_{S} (X_{2}, X_{3}) =

d_{S} (X_{1}, X_{2})

d_{S} (X_{1}, X_{2})

\leq

=

\leq

=

=

(d_{S} (X_{3}, X_{2}) - i = n + 1 \sum n + s d (y_{i}, z_{i}) - i = n + s + 1 \sum n + s + t M (z_{i})) + i = n + 1 \sum n + s + t M (z_{i})

=

=

\leq

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDNA and Biological Computing · Cooperative Communication and Network Coding

Full text

On a Subset Metric

Richard Castro