Reversibility of distance measures of states with some focus on total variation distance

Keiji Matsumoto

arXiv:1907.10604·quant-ph·July 3, 2025

Reversibility of distance measures of states with some focus on total variation distance

Keiji Matsumoto

PDF

Open Access

TL;DR

This paper investigates how classical probability distances relate to quantum states, showing that total variation distance can sometimes be preserved under measurement, contrary to previous assumptions, with specific conditions identified.

Contribution

The paper extends the understanding of distance measure reversibility from operator convex functions to strictly convex functions and provides conditions under which total variation distance remains unchanged.

Findings

01

Total variation distance can be preserved under measurement for certain quantum states.

02

Extension of reversibility results to strictly convex functions beyond operator convex functions.

03

Necessary and sufficient conditions identified for qubit states regarding total variation distance preservation.

Abstract

Consider a classical system, which is in the state described by probability distribution $p$ or $q$ , and embed these classical informations into quantum system by a physical map $Γ$ , $ρ = Γ (p)$ and $σ = Γ (q)$ . Intuitively, the pair ${p_{ρ}^{M}, p_{σ}^{M}}$ of the distributions of the data of the measurement $M$ on the pair ${ρ, σ}$ should contain strictly less information than the pair ${p, q}$ provided the pair ${ρ, σ}$ is non-commutative. Indeed, this statement had been shown if the information is measured by $f$ -divergence such that $f$ is operator convex. In the paper, the statement is extended to the case where $f$ is strictly convex. Also, we disprove the assertion for the total variation distance $∥ p - q ∥_{1}$ , the $f$ -divergence with $f (r) = ∣1 - r ∣$ : if ${ρ, σ}$ satisfies some not very restrictive conditions, $\Vert…

Equations123

D_{f} (p ∥ q) : = x \in X \sum q_{x} f (p_{x} / q_{x}),

D_{f} (p ∥ q) : = x \in X \sum q_{x} f (p_{x} / q_{x}),

0 \cdot f (p /0) :

0 \cdot f (p /0) :

\hat{f} (0) :

D_{f}^{m a x} (ρ ∥ σ) = (Γ, {p, q}) min D_{f} (p ∥ q),

D_{f}^{m a x} (ρ ∥ σ) = (Γ, {p, q}) min D_{f} (p ∥ q),

ρ = Γ (p), σ = Γ (q) .

ρ = Γ (p), σ = Γ (q) .

D_{f}^{m a x} (ρ ∥ σ) = tr σ f (σ^{- 1/2} ρ σ^{- 1/2}),

D_{f}^{m a x} (ρ ∥ σ) = tr σ f (σ^{- 1/2} ρ σ^{- 1/2}),

f_{α} (r) = (\pm) r^{α}, α \in (- 1, 1),

f_{α} (r) = (\pm) r^{α}, α \in (- 1, 1),

∥ p - q ∥_{1} = x \in X \sum ∣ p_{x} - q_{x} ∣,

∥ p - q ∥_{1} = x \in X \sum ∣ p_{x} - q_{x} ∣,

p_{x}^{'} = tr M_{x} ρ, q_{x}^{'} = tr σ M_{x} .

p_{x}^{'} = tr M_{x} ρ, q_{x}^{'} = tr σ M_{x} .

D_{f}^{m a x} (ρ ∥ σ) \geq D_{f} (p^{'} ∥ q^{'}) .

D_{f}^{m a x} (ρ ∥ σ) \geq D_{f} (p^{'} ∥ q^{'}) .

f (c r_{1} + (1 - c) r_{2}) < c f (r_{1}) + (1 - c) f (r_{2}), 0 < c < 1,

f (c r_{1} + (1 - c) r_{2}) < c f (r_{1}) + (1 - c) f (r_{2}), 0 < c < 1,

r_{x}\colon=\left\{\begin{array}[c]{cc}p_{x}/q_{x},&x\notin\mathcal{X}_{0},\\ \infty,&x\in\mathcal{X}_{0}.\end{array}\right.

r_{x}\colon=\left\{\begin{array}[c]{cc}p_{x}/q_{x},&x\notin\mathcal{X}_{0},\\ \infty,&x\in\mathcal{X}_{0}.\end{array}\right.

p_{y}^{'} = x \in X \sum P (y ∣ x) p_{x}, q_{y}^{'} = x \in X \sum P (y ∣ x) q_{x},

p_{y}^{'} = x \in X \sum P (y ∣ x) p_{x}, q_{y}^{'} = x \in X \sum P (y ∣ x) q_{x},

D_{f} (p ∥ q) = D_{f} (p^{'} ∥ q^{'}) < \infty.

D_{f} (p ∥ q) = D_{f} (p^{'} ∥ q^{'}) < \infty.

f (r) = f_{0} (r) + \hat{f} (0) r,

f (r) = f_{0} (r) + \hat{f} (0) r,

D_{f} (p ∥ q) = x \in / X_{0} \sum q_{x} f_{0} (r_{x}) + x \in X \sum p_{x} \hat{f} (0) .

D_{f} (p ∥ q) = x \in / X_{0} \sum q_{x} f_{0} (r_{x}) + x \in X \sum p_{x} \hat{f} (0) .

D_{f} (p^{'} ∥ q^{'})

D_{f} (p^{'} ∥ q^{'})

= y \in / Y_{0} \sum q_{y}^{'} f_{0} (r_{y}^{'}) + y \in Y \sum p_{y} \hat{f} (0)

= y \in / Y_{0} \sum q_{y}^{'} f_{0} (x \in / X_{0} \sum Q (x ∣ y) r_{x} + \frac{1}{q _{y}^{'}} x \in X_{0} \sum P (y ∣ x) p_{x})

+ y \in Y \sum p_{y} \hat{f} (0),

Q (x ∣ y) := \frac{q _{x}}{q _{y}^{'}} P (y ∣ x), x \in / X_{0}, y \in / Y_{0} .

Q (x ∣ y) := \frac{q _{x}}{q _{y}^{'}} P (y ∣ x), x \in / X_{0}, y \in / Y_{0} .

D_{f} (p^{'} ∥ q^{'})

D_{f} (p^{'} ∥ q^{'})

\leq x \in / X_{0} \sum y \in / Y_{0} \sum Q (x ∣ y) q_{y}^{'} f_{0} (r_{x}) + y \in Y \sum p_{y} \hat{f} (0)

= x \in / X_{0} \sum q_{x} f_{0} (r_{x}) + y \in Y \sum p_{y} \hat{f} (0)

= D_{f} (p ∥ q) .

x \in X_{0} \sum P (y ∣ x) p_{x} = 0, y \in / Y_{0} .

x \in X_{0} \sum P (y ∣ x) p_{x} = 0, y \in / Y_{0} .

Q (x ∣ y) = 0, r_{x} \neq = r_{y}^{'}, x \in / X_{0}, y \in / Y_{0} .

Q (x ∣ y) = 0, r_{x} \neq = r_{y}^{'}, x \in / X_{0}, y \in / Y_{0} .

P (y ∣ x) = 0, r_{x} \neq = r_{y}^{'}, y \in / Y_{0} .

P (y ∣ x) = 0, r_{x} \neq = r_{y}^{'}, y \in / Y_{0} .

P (y ∣ x) = 0, y \in Y_{0}, x \in / X_{0} .

P (y ∣ x) = 0, y \in Y_{0}, x \in / X_{0} .

D_{f} (p ∥ q) = D_{f} (p^{'} ∥ q^{'}) < \infty.

D_{f} (p ∥ q) = D_{f} (p^{'} ∥ q^{'}) < \infty.

p_{y}^{'} = x \in X \sum P (y ∣ x) p_{x}, q_{y}^{'} = x \in X \sum P (y ∣ x) q_{x},

p_{y}^{'} = x \in X \sum P (y ∣ x) p_{x}, q_{y}^{'} = x \in X \sum P (y ∣ x) q_{x},

P (y ∣ x) = tr M_{y} Γ (δ_{x}),

P (y ∣ x) = tr M_{y} Γ (δ_{x}),

ρ_{r} :

ρ_{r} :

\tilde{M}_{r} :

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMathematical Inequalities and Applications · Mathematical functions and polynomials · Quantum Information and Cryptography

Full text

Reversibility of distance mesures of states with some focus on total variation distance

Keiji Matsumoto

Quantum Computation Group, National Institute of Informatics,

2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430,

e-mail:[email protected]

Abstract

Consider a classical system, which is in the state described by probability distribution $p$ or $q$ , and embed these classical informations into quantum system by a physical map $\Gamma$ , $\rho=\Gamma(p)$ and $\sigma=\Gamma(q)$ . Intuitively, the pair $\{p_{\rho}^{M},p_{\sigma}^{M}\}$ of the distributions of the data of the measurement $M$ on the pair $\{\rho,\sigma\}$ should contain strictly less information than the pair $\{p,q\}$ provided the pair $\{\rho,\sigma\}$ is non-commutative. Indeed, this statement had been shown if the information is measured by $f$ -divergence such that $f$ is operator convex. In the paper, the statement is extended to the case where $f$ is strictly convex. Also, we disprove the assertion for the total variation distance $\|p-q\|_{1}$ , the $f$ -divergence with $f(r)=|1-r|$ : if $\{\rho,\sigma\}$ satisfies some not very restrictive conditions, $\|p_{\rho}^{M}-p_{\sigma}^{M}\|_{1}$ equals $\|p-q\|_{1}$ . Here we present sufficient condition for general case, and necessary and sufficient condition for qubit states.

1 Introduction

Consider a classical system, which is in the state described by probability distribution $p$ or $q$ , and embed these classical informations into quantum system by a physical map $\Gamma$ , $\rho=\Gamma(p)$ and $\sigma=\Gamma(q)$ . Intuitively, the pair $\{p_{\rho}^{M},p_{\sigma}^{M}\}$ of the distributions of the data of the measurement $M$ on the pair $\{\rho,\sigma\}$ should contain strictly less information than the pair $\{p,q\}$ provided the pair $\{\rho,\sigma\}$ is non-commutative. Indeed, this statement had been shown if the information is measured by $f$ -divergence such that $f$ is operator convex [1]. In the paper, the statement is extended to the case where $f$ is strictly convex. Also, we disprove the assertion for the total variation distance $\|p-q\|_{1}$ , the $f$ -divergence with $f(r)=|1-r|$ : if $\{\rho,\sigma\}$ satisfies some not very restrictive conditions, $\|p_{\rho}^{M}-p_{\sigma}^{M}\|_{1}$ equals $\|p-q\|_{1}$ . Here we present sufficient condition for general case, and necessary and sufficient condition for qubit states.

2 Embedding Classical Information Into Quantum States

Consider a classical memory system, whose state is described by probability distribution $p$ or $q$ depending on the value of the bit recorded. Suppose we embed this information into quantum system by some physical operation $\Gamma$ , or completely positive trace preserving (CPTP) map from commutative system into operators $\mathcal{B}\left(\mathcal{H}\right)$ over Hilbert space $\mathcal{H}$ . (In this paper, we stick to the finite dimensional case.) Then we obtain a quantum system whose state is either $\rho=\Gamma(p)$ or $\sigma=\Gamma(q)$ depending on the value of the bit.

Suppose now we are given $\left\{\rho,\sigma\right\}$ , and the question is how much of information is contained in $\{p,q\}$ . The answer relies on the measure of information, and also on the choice of $\Gamma$ . In the paper, we use $f$ - divergence between $p$ and $q$ to measure the amount of information:

[TABLE]

where $\mathcal{X}$ is a finite set, and $f$ is a convex function. In the definition, we used the convention

[TABLE]

By choosing $f$ properly, $f$ - divergence represents almost all frequently used distance measures (or their monotone function): relative entropy (Kullback-Leibler divergence), Renyi relative entropy, total variation distance, and so on.

As for dependence of $\Gamma$ , we suppose the encoder did their best: thus our question is to find

[TABLE]

where $\left(\Gamma,\{p,q\}\right)$ moves over all the triple satisfying

[TABLE]

Since resulted from optimization problem, $\mathrm{D}_{f}^{\max}$ is monotone decreasing by CPTP maps. Also, when $\left[\rho,\sigma\right]=0$ , it reduces to its classical version $\mathrm{D}_{f}$ .

If $f$ is operator convex,

[TABLE]

provided $\sigma>0$ and $\rho>0$ [2]. Examples are $r\ln r$ , and

[TABLE]

where the sign $\pm$ is chosen so that $f_{\alpha}$ is convex are operator convex. The former and the latter corresponds to relative entropy and relative Renyi entropy, respectively.

However, the function $|1-r|$ , which corresponds to total variation distance

[TABLE]

is not operator convex, and there is no known closed formula for $\mathrm{D}_{\left|1-r\right|}^{\max}(\rho\|\sigma)$ .

3 Reversibility

To read classical information from a quantum source $\left\{\rho,\sigma\right\}$ , a measurement $M$ is applied to the system, to produce the probability distributions:

[TABLE]

Obviously,

[TABLE]

If $\left[\rho,\sigma\right]=0$ , the identity in the above inequality holds: in fact, if $f$ is strictly convex

[TABLE]

this is the only possible case for the equality to holds. Some preparations are necessary to prove the assertion. Let $\left\{p,q\right\}$ and $\left\{p^{\prime},q^{\prime}\right\}$ be a probability distribution over a finite set $\mathcal{X}$ and $\mathcal{Y}$ , respectively. Define $\mathcal{X}_{0}\colon=\{x;q_{x}=0\}$ , and

[TABLE]

$\mathcal{Y}_{0}$ and $r_{y}^{\prime}$ are defined almost analogously.

Lemma 1

Suppose there is a transition probability $P(y|x)$ with

[TABLE]

and there is a strictly convex function on $(0,\infty)$ with

[TABLE]

Then $P(x|y)=0$ for all $x$ and $y$ with $r_{x}\neq r_{y}$ .

Proof. First, we prove the case where $\hat{f}(0)<\infty$ . Then $f(r)$ decomposes into

[TABLE]

where $f_{0}$ is monotone non - increasing. Then

[TABLE]

where

[TABLE]

Since $\frac{1}{q_{y}^{\prime}}\sum_{x\in\mathcal{X}_{0}}P(y|x)q_{x}\geq 0$ and $\sum_{x\notin\mathcal{X}_{0}}Q(x|y)=1$ ,

[TABLE]

Since $f$ is strictly convex, $\mathrm{D}_{f}(p^{\prime}\|q^{\prime})=\mathrm{D}_{f}(p\|q)$ holds only if

[TABLE]

and

[TABLE]

These are equivalent to

[TABLE]

Also, the condition $q_{y}^{\prime}=\sum_{x\in\mathcal{X}}P\left(y|x\right)q_{x}$ implies

[TABLE]

Therefore, we have the assertion provided $\hat{f}(0)<\infty$ .

Next, we study the case where $\hat{f}(0)=\infty$ . Then $\mathrm{D}_{f}(p\|q)=\mathrm{D}_{f}(p^{\prime}\|q^{\prime})<\infty$ implies $\mathcal{X}_{0}=\emptyset$ and $\mathcal{Y}_{0}=\emptyset$ . Then doing almost analogously as above, we have the assertion.

Theorem 2

Suppose $f$ is strictly convex function on $(0,\infty)$ , and $\mathrm{D}_{f}^{\max}(\rho\|\sigma)<\infty$ . Then the equality in the inequality (2) holds only if $[\rho,\sigma]=0.$

If $f$ is non - linear and operator convex, it is strictly convex. Therefore, the theorem applies to relative and Renyi relative entropy.

Proof. Let $\left(\Gamma,\left\{p,q\right\}\right)$ be a triplet achieving $\mathrm{D}_{f}^{\max}(\rho\|\sigma)=\mathrm{D}_{f}(p\|q)$ . Then the equality in the inequality (2) holds only if

[TABLE]

Since the composition of $\Gamma$ followed by the measurement $M$ is a linear, positive, and probability preserving map, there is a transition probability $P(y|x)$ such that

[TABLE]

and

[TABLE]

where $\delta_{x}$ is delta distribution at $x$ . Therefore, by Lemma 1, $\mathrm{tr}\,M_{y}\Gamma(\delta_{x})=0$ provided $r_{x}\neq r_{y}^{\prime}$ .

Define

[TABLE]

Then if $r<\infty$ , observe $\rho_{r}=r\sigma_{r}$ , and

[TABLE]

Therefore, supports of positive operators

[TABLE]

are non - overlapping with each other.

Therefore, the assertion $\left[\rho,\sigma\right]=0$ follows since

[TABLE]

This theorem means that the classical information embedded into non-orthogonal states cannot be recovered completely by any measurement. At first glance, the statement seems almost trivial, but in the proof we fully exploit the fact that $f$ is strictly convex, and in fact, is not true if the information measure is total variation distance.

4 Total variation distance

4.1 Set up and a general formula

Total variation distance, or the divergence corresponding to $f\left(r\right)=\left|1-r\right|$ , is one of most frequently used distance measures between two probability distributions. Its most common quantum version is

[TABLE]

where $P_{\rho}^{M}$ is the distribution of the outcome of the measurement $M$ under $\rho$ . Obviously,

[TABLE]

Given a triple $\left(\Gamma,\left\{p,q\right\}\right)$ of $\left\{\rho,\sigma\right\}$ , we define $\left(\Gamma^{\prime},\left\{p^{\prime},q^{\prime}\right\}\right)$ , where $\left\{p^{\prime},q^{\prime}\right\}$ are probability distributions on $\left\{0,1,2\right\}$ :

[TABLE]

where

[TABLE]

Then $\left(\Gamma^{\prime},\left\{p^{\prime},q^{\prime}\right\}\right)$ satisfies (1) and $\left\|p^{\prime}-q^{\prime}\right\|_{1}=\left\|p-q\right\|_{1}$ .

(Intuitively, $\Gamma^{\prime}\left(\delta_{0}\right)$ takes care of the common part of two states, and $\Gamma^{\prime}\left(\delta_{1}\right)$ and $\Gamma^{\prime}\left(\delta_{2}\right)$ compensates the reminder.)

Therefore, without loss of generality, we may restrict ourselves to the one in the form of (3), where $A$ is an operator with

[TABLE]

Therefore, we have:

[TABLE]

4.2 Reversibility

In this subsection and the next, suppose $\mathrm{tr}\,\rho=\mathrm{tr}\,\sigma=1$ . We study the conditions for

[TABLE]

This implies that any quantum version of statistical distance $\mathrm{D}_{\left|1-r\right|}^{Q}\left(\rho\|\sigma\right)$ equals to $\left\|\rho-\sigma\right\|_{1}$ . Intuitively, this means classical statistical distance encoded into quantum states can be completely retrieved. As stated, such complete retrieval of $f$ - divergence scarcely occurs if $f$ is operator convex and $\rho$ and $\sigma$ do not commute. The statistical distance is very different from $f$ - divergence induced by an operator convex function in this respect.

If we drop the constraint $A\geq 0$ and suppose $\mathrm{tr}\,\rho=\mathrm{tr}\,\sigma$ ,

[TABLE]

Here, the minimum in the third line is achieved if $\rho-A=\,\left[\rho-\sigma\right]_{+}$ . ( $\left[X\right]_{+}$ is the positive part of the self-adjoint operator $X$ .)

Therefore, (5) holds iff

[TABLE]

(Here, $\left|X\right|:=\sqrt{X^{\dagger}X}$ .) Another necessary and sufficient condition is the existence of $A$ , $\Delta_{1}$ , $\Delta_{2}\geq 0$ with

[TABLE]

To see this, observe

[TABLE]

For (5) to hold, existence of $\Delta_{1}$ , $\Delta_{2}$ with $\mathrm{tr}\,\Delta_{1}+\mathrm{tr}\,\Delta_{2}=\left\|\Delta_{1}-\Delta_{2}\right\|_{1}$ is necessary and sufficient. Thus $\Delta_{1}\Delta_{2}=0$ .

Of course, in general, (6) is not true. For example, if $\sigma=\left|\psi\right\rangle\left\langle\psi\right|$ is a pure state,

[TABLE]

where $\rho_{22}\colon=\left(I-\left|\psi\right\rangle\left\langle\psi\right|\right)\rho\left(I-\left|\psi\right\rangle\left\langle\psi\right|\right)$ and $\rho_{22}^{-1}$ denotes its generalized inverse [2].

However, if $\rho$ and $\sigma$ are very close so that

[TABLE]

it is true.

Another sufficient condition is

[TABLE]

To see this is sufficient, take the square root of both sides of inequality: then we obtain (6). (Recall $\sqrt{\cdot}$ is operator monotone. This condition is not necessary, since $r^{2}$ is not operator monotone.) Rearranging the terms, we have

[TABLE]

4.3 2 - dimensional case

In this subsection, we assume $\dim\mathcal{H}=2$ and $\mathrm{tr}\,\rho=\mathrm{tr}\,\sigma=1$ , and compute the set $\left\{\sigma;\text{(\ref{D=TV})}\right\}$ for each fixed $\rho$ , using the necessary and sufficient condition given by (7) and (8). As it turns out, this set is the spheroid, with focal points $\rho$ and $\mathbf{1}-\rho$ , and touching to the surface of Bloch sphere at each end of the longest axis.

Since $\mathrm{tr}\,\rho=\mathrm{tr}\,\sigma=1$ ,

[TABLE]

and

[TABLE]

Let $v_{\rho}$ , $v_{\sigma}$ , $u_{1}$ , $u_{2}$ , and $u_{A}$ be the Bloch vector of $\rho$ , $\sigma$ , $\frac{1}{c}\Delta_{1}$ , $\frac{1}{c}\Delta_{2}$ , and $\frac{1}{1-c}A$ , respectively. Also, (8) holds iff $\Delta_{1}$ and $\Delta_{2}$ are rank - 1 and $u_{2}=-u_{1}.$ Therefore, by (7),

[TABLE]

Therefore,

[TABLE]

Let $\left\|\cdot\right\|$ denote the Euclid norm in $\mathbb{R}^{3}$ , and

[TABLE]

The set $\left\{\sigma;\text{(\ref{D=TV})}\right\}$ is fairly large. For example, if the largest eigenvalue of $\rho$ is $\leq 0.85$ , this occupies more than the half of the volume of the Bloch sphere.

If

[TABLE]

the minimization problem (4) is solved explicitly. With $Z:=\mathrm{diag}\left(1,-1\right)$ , $\sigma=Z\rho Z^{\dagger}$ , $\rho=Z\sigma Z^{\dagger}$ . Thus, if $A$ satisfies constrains of (4), so does $\frac{1}{2}\left(ZAZ^{\dagger}+A\right)$ , and $\mathrm{tr}\,A=\mathrm{tr}\,\frac{1}{2}\left(ZAZ^{\dagger}+A\right)$ . Therefore, without loss of generality, we suppose $A$ is diagonal. After some elementary analysis, the optimal $A$ turns out to be

[TABLE]

and we have

[TABLE]

Bibliography2

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Fumio Hiai, Milan Mosonyi, Different quantum f-divergences and the reversibility of quantum operations,Reviews in Mathematical Physics, Volume No.29, Issue No. 07, (2017)
2[2] K. Matsumoto, ”A new quantum version of f-divergence,” ar Xiv:1311.4722 (2003)