Note on bounds for symmetric divergence measures
S.Furuichi, K.Yanagi, K.Kuriyama

TL;DR
This paper extends existing bounds on symmetric divergence measures by introducing classical q-extensions and non-commutative extensions, building on prior results by Gilardoni and Sason.
Contribution
It provides new extensions of tight bounds for symmetric divergence measures, including q-extensions and non-commutative cases, advancing theoretical understanding.
Findings
Derived classical q-extensions of divergence bounds
Developed non-commutative extensions for divergence measures
Built upon Gilardoni and Sason's foundational results
Abstract
I. Sason obtained the tight bounds for symmetric divergence measures are derived by applying the results established by G. L. Gilardoni. In this article, we are going to report two kinds of extensions for the above results, namely classical q-extension and non-commutative extension.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematical Inequalities and Applications · Statistical Mechanics and Entropy · Mathematical functions and polynomials
aff1]Nihon University aff2]Josai University aff3]Yamaguchi University \corresp[cor1]Corresponding author: [email protected]
Note on bounds for symmetric divergence measures
S.Furuichi
K.Yanagi
K.Kuriyama
[
[
[
Abstract
In the paper [1], the tight bounds for symmetric divergence measures are derived by applying the results established in the paper [2]. In this article, we are going to report two kinds of extensions for the above results, namely classical -extension and non-commutative(quantum) extension.
1 INTRODUCTION
In the paper [1], the tight bounds for symmetric divergence measures are derived by applying the results established in the paper [2]. In the paper [1], the minimization problem for Bhattacharyya coefficient, Chernoff information, Jensen-Shannon divergence and Jeffrey’s divergence under the constraint on total variation distance. In this article, we are going to report two kinds of extensions for the above results, namely classical -extension and non-commutative(quantum) extension. The parametric -extension means that Tsallis entropy [3] converges to Shannon entropy when . Namely, all results with the parameter recover the usual (standard) Shannon’s results when . We give here list of our extensions as follows.
- (i)
The lower bound for Jensen-Shannon-Tsallis diverence is given by applying the results in [2].
- (ii)
The lower bound for Jeffrey-Tsallis divergence is given by applying the results in [2] and deriving -Pinsker’s inequality for . This implies new upper bounds of .
- (iii)
The lower bound for quantum Chernoff information is given by the known relation between the trace distance and fidelity.
- (iv)
The lower bound for quantum Jeffrey divergence is given by applying the monotonicity (data processing inequality) of quantum -divergence.
2 -EXTENDED CASES
Here we review some quantities. The total variation distance between two probability distributions and is defined by
[TABLE]
where represents norm. The -divergence introduced by Csiszár in [4] is defined by
[TABLE]
where is convex function and . If we take , where is -logarithmic function defined for and , then -divergence is equal to the Tsallis relative entropy (Tsallis divergence) defined by (see e.g., [5])
[TABLE]
In this section, we use the result established by Gilardoni in [2] for the symmetric divergence.
Theorem (Gilardoni, 2006 [2]) We suppose is symmetric divergence (which condition is known as , and is constant number) and with . Then we have
[TABLE]
As corollaries of the above theorem, we obtain the following two propositions. We define the Jensen-Shannon-Tsallis diverence as
[TABLE]
Then with , is convex, with and . Thus we have the following proposition which is -parametric extension of Proposition 3 in [1].
Proposition 1
[TABLE]
The equality is archived when .
We also define Jeffrey-Tsallis divergence as
[TABLE]
Then with , is convex with and . Thus we have the following proposition which is -parametric extension of Proposition 4 in [1].
Proposition 2
[TABLE]
The equality is archived when .
Here we are able to prove the following lemma, which may be named -Pinsker’s inequality.
Lemma 1
[TABLE]
Proof: The proof is easily done by the fact that implies , putting . Thus we have
[TABLE]
for . Thus we have this lemma by data processing inequality.
As remark, the above -Pinsker inequality does not hold for the case , since we have counter-examples. Applying this lemma, we can prove the following proposition, which condition is same to the paper [1] except for the extended parameter .
Theorem 1 Consider a memoryless stationary source with alphabet with probability distribution and assume that a uniquely decodable code with an alphabet size . For , we have
[TABLE]
Where ,,, and
Proof: We give the sketch of the proof of this proposition. Firstly is trivial. By Lemma 1, we have
[TABLE]
where , and and are distributions of new random variable . By simple computations with formula , we have
[TABLE]
since the Kraft-McMillian inequality was used. Thus we have
Remark 1 This theorem is a parametric extension of the inequality (32) in the paper [1] in the sense that the left hand side of our inequality contains the parameter . We also note that the condition is corresponding to the result in our previous paper [6], so the condition may not be so unnatural within our framework of this topic.
In addition, we compare our upper bound with parameter obtained in Theorem 1 and that obtained in the paper [1]. Actually we give an example such that , where was used in the paper [1] as . Consider the following information source
[TABLE]
with . Then we have the code by Shannon-Fano coding, so that since . By numerical computations, we have and . This means there exists a code such that , which shows our upper bound with the parameter is tighter than the upper bound in the paper [1], in this example. We performed some numerical computations with a few information sources, then we could find the parameter such that for the case .
However, for the case (e.g., Huffman code), the following proposition can be proven.
Proposition 3 Let and . Then we have the relation .
Proof: We firstly prove the inequality for , where Since , if , then and if , then , thus we have . Putting and , taking summation on both sides by and dividing the both sides by , we have
[TABLE]
When , we thus obtain the inequality , taking account that the usual average code length can be rewritten as .
This proposition shows that for the special (but nontrivial) case , the upper bound given in (32) of the paper [1] is always tighter than ours (for ) obtained in Theorem 1.
3 NON-COMMUTATIVE CASES
Let and be density matrices (quantum states), which are positive semi-definite matrices and unit trace. Then the following quantities are well known in the field of quantum information or physics as trace distance and fidelity, respectively:
[TABLE]
Where . Then we have the following propositions.
Proposition 4 For the trace distance and fidelity, we have the following relation:
[TABLE]
This relation is well known in the field of quantum information or quantum statistical physics, and this proposition is non-commutative extension of Proposition 1 in the paper [1].
By the easy calculations such as , we have the following proposition.
Proposition 5 For the quantum Chernoff information, we have
[TABLE]
The above proposition is also non-commutative extension of Proposition 2 in the paper [1].
The quantum Pinsker inequality on quantum relative entropy (divergence) and similar one are known (see e.g., [7] and [8], respectively)
[TABLE]
and
[TABLE]
To show our final result, we use the following well-known fact. See [7] for example.
Lemma 2 Let be a state transformation. For an operator monotone decreasing function , the monotonicity holds:
[TABLE]
where is the quantum -divergence, with is the relative modular operator such as and .
Theorem 2 The quantum Jeffrey divergence defined by has the following lower bound:
[TABLE]
Proof: By Lemma 2, Proposition 4 in the paper [1] and (which will be shown in the end of proof), we have
[TABLE]
Here we note that is operator convex which is equivalent to operator monotone decreasing and we have , since .
Finally, we show . Let be commutative -algebra generated by , be the set of all matrices and set the map as trace preserving, conditional expectation. If we take and , then two elements and of Jordan decomposition of , are commutative functional calculus of , and we have which implies .
4 ACKNOWLEDGMENTS
The author (S. F.) was partially supported by JSPS KAKENHI Grant Number 16K05257.
5 Appendix: Added notes related to Theorem 1
Actually we have which is the usual average code length, but the definition of in Theorem 1 seems to be complicated and somewhat unnatural to understand its meaning. In order to overcome this problem, we may adopt the simple alternative definition for instead of that in Theorem 1. Then we have the following proposition.
Proposition A Let and . Then we have
[TABLE]
Where ,,, and , where -exponential function is the inverse function of -logarithmic function and its form is given in the proof of this proposition.
Proof: By the same way to the proof of Theorem 1, we have
[TABLE]
By simple computations with formula , we have
[TABLE]
since , implies thus we have , then the definition of -exponential function
[TABLE]
shows and was used. Thus we have
We could not remove the needless and meaningless condition in the above proposition, unfortunately. It is known that the inequality holds for the uniquely decodable code and the equality holds if the code archives the entropy, namely [1]. In our proposition, we obtained -parametric extension but it does not have any information theoretical meaning. We will have to consider about this problem in the future.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] I. Sason, Tight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding, IEEE, TIT, Vol. 61(2015),pp.701–707.
- 2[2] G. L. Gilardoni, On the minimum f 𝑓 f -divergence for given total variation, C. R. Acad. Sci. Paris, Ser. I, Vol.343 (2006), pp.763–766.
- 3[3] C.Tsallis, Possible generalization of Bolzmann-Gibbs statistics, J.Stat. Phys., Vol.52(1988), pp. 479–487.
- 4[4] I. Csiszár, Information-type measures of difference of probability distributions and indirect observations, Stud. Sci. Math. Hungarica, Vol. 2(1967), pp. 299–318.
- 5[5] S.Furuichi, K.Yanagi and K.Kuriyama, Fundamental properties of Tsallis relative entropy, J.Math.Phys., Vol.45(2004), pp.4868–4877.
- 6[6] S.Furuichi, Information theoretical properties of Tsallis entropies, J.Math.Phys., Vol.47(2006), 023302.
- 7[7] D.Petz, Quantum information theory and quantum statistics, Springer, 2004.
- 8[8] E.A.Carlen and E.H. Lieb, Remainder terms for some quantum entropy inequalities, J. Math. Phys., Vol.55 (2014), 042201.
