TL;DR
MAXENT3D_PID is a software tool that accurately computes the trivariate partial information decomposition within the maximum entropy framework, enabling detailed analysis of unique, redundant, and synergistic information components.
Contribution
This paper introduces MAXENT3D_PID, a software implementation of the Cone Programming-based measure for trivariate partial information decomposition, with detailed usage and validation experiments.
Findings
Software accurately estimates trivariate partial information components
Demonstrates effectiveness of Cone Programming approach
Provides a practical tool for information decomposition analysis
Abstract
Chicharro (2017) introduced a procedure to determine multivariate partial information measures within the maximum entropy framework, separating unique, redundant, and synergistic components of information. Makkeh, Theis, and Vicente (2018) formulated the latter trivariate partial information measure as Cone Programming. In this paper, we present MAXENT3D_PID, a production-quality software that computes the trivariate partial information measure based on the Cone Programming model. We describe in detail our software, explain how to use it, and perform some experiments reflecting its accuracy in estimating the trivariate partial information decomposition.
| Keys | Values | Keys | Values |
|---|---|---|---|
’UIX’ |
’UIYZ’ |
||
’UIY’ |
’UIXZ’ |
||
’UIZ’ |
’UIXY’ |
||
’CI’ |
’SI’ |
| Key | Value |
|---|---|
’Num_Err_I’ |
Optimality violations of |
’Num_Err_12’ |
Optimality violations of |
’Num_Err_13’ |
Optimality violations of |
’Num_Err_23’ |
Optimality violations of |
| Parameter | Description | Default Value |
|---|---|---|
| feastol | primal/dual feasibility tolerance | |
| abstol | absolute tolerance on duality gap | |
| reltol | relative tolerance on duality gap | |
| feastol_inacc | primal/dual infeasibility relaxed tolerance | |
| abstol_inacc | absolute relaxed tolerance on duality gap | |
| reltol_inacc | relaxed relative duality gap | |
| max_iter | maximum number of iterations that ECOS does |
| Value | Description |
|---|---|
| 0 (default) | Simple Mode: pid() prints its output (python dictionary). |
| 1 | Time Mode: In addition to what is printed when output=0, |
| pid() prints a flag when it starts preparing the optimization problems | |
| in (1), the total time to create each problem, a flag when it calls ECOS, | |
| brief stats from ECOS of each problem after solving it (Figure 4), | |
| the total time for retrieving the results, the total time for computing | |
| the optimality violations, and the total time to store the results. | |
| 2 | Detailed Time Mode: In addition to what is printed when output=0, |
| pid() prints for each problem the time of each major step of creating | |
| the model, brief stats from ECOS of each problem after solving it, | |
| the total time of each function used for retrieving the results, | |
| the time of each major step used to computing the optimality violations, | |
| the time of each function used to obtain the final results, | |
| and the total time to store the results. | |
| 3 | Detailed Optimization Mode: In addition to what is printed when |
| output=1, pid() prints ECOS detailed stats of each problem | |
| after solving it (Figure 5). |
| Instance | Operation |
|---|---|
| XorDuplicate | i.i.d. |
| XorLoses | i.i.d. |
| XorMultiCoal | |
| i.i.d. | |
| AndDuplicate | i.i.d. |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
MaxEnt3D_Pid: An Estimator for the Maximum-entropy Trivariate Partial Information Decomposition
Abdullah Makkeh
Institute of Computer Science of the University of Tartu, Tartu, Estonia
Daniel Chicharro
Department of Neurobiology, Harvard Medical School, Boston, MA, USA
Center for Neuroscience and Cognitive Systems @ UniTn, Istituto Italiano di Tecnologia, Rovereto (TN), Italy
Dirk Oliver Theis
Institute of Computer Science of the University of Tartu, Tartu, Estonia
Raul Vicente
Institute of Computer Science of the University of Tartu, Tartu, Estonia
Abstract
Chicharro [13] introduced a procedure to determine multivariate partial information measures within the maximum entropy framework, separating unique, redundant, and synergistic components of information. Makkeh, Theis, and Vicente [48] formulated the trivariate partial information measure of [13] as Cone Programming. In this paper, we present MaxEnt3D_Pid, a production-quality software that computes the trivariate partial information measure based on the Cone Programming model. We describe in detail our software, explain how to use it, and perform some experiments reflecting its accuracy in estimating the trivariate partial information decomposition.
Keywords: multivariate partial information decomposition, cone programming, synergy, redundancy, Python
1 Introduction: Motivation and Significance
The characterization of dependencies within complex multivariate systems helps identifying the mechanisms operating in the system and understanding their function. Recent work has developed methods to characterize multivariate interactions by separating -variate dependencies for different orders [1, 60, 66, 51, 54]. In particular, the work of [78, 77] introduced a framework, called Partial Information Decomposition (PID), which quantifies whether different input variables provide redundant, unique, or synergistic information about an output variable when combined with other input variables. Intuitively, inputs are redundant if each carries individually information about the same aspects of the output. Information is unique if it is not carried by any other single (or group of) variables, and synergistic information can only be retrieved combining several inputs.
This information-theoretic approach to study interactions has found many applications to complex systems such as genes networks e.g. [2, 72, 12], interactive agents e.g. [40, 28, 3, 29], or neural processing e.g. [50, 25, 56]. More generally, the nature of the information contained in the inputs determines the complexity of extracting it [42, 69], how robust it is to disruptions of the system [58], or how inputs dimensionality can be reduced without information loss [67, 6].
Despite this great potential, the applicability of the PID framework has been hindered by the lack of agreement on the definition of a suited measure of redundancy. In particular, [33] indicated that the original measure proposed by [78] only quantifies common amounts of information, instead of shared information that is qualitatively the same. A constellation of measures has been proposed to implement the PID e.g. [33, 10, 31, 35, 39, 14, 27] and core properties, such as requiring nonnegativity as a property of the measures, are still a subject of debate [57, 27, 38, 15]
A widespread application of the PID framework has also been limited by the lack of multivariate implementations. Some of the proposed measures were only defined for the bivariate case [33, 10, 59]. Other multivariate measures allow negative components in the PID [35, 27], which, although it may be adequate for a statistical characterization of dependencies, limits the interpretation of the information-theoretic quantities in terms of information communication [17]. Among the PID measures proposed, the maximum entropy measures of [10] have a preeminent role in the bivariate case because they provide bounds for any other measure consistent with a set of properties shared by many of the proposed measures. Motivated by this special role of the maximum entropy measures, [13] extended the maximum entropy approach to measures of the multivariate redundant information, which provide analogous bounds for the multivariate case. However, [13] did not address their numerical implementation.
In this work we present MaxEnt3D_Pid, a python module that computes a trivariate information decomposition following the maximum entropy PID of [13] and exploiting the connection with the bivariate decompositions associated with the trivariate one [14]. This is, to our knowledge, the first available implementation of the maximum-entropy PID framework beyond the bivariate case [46, 5, 48, 37]. This implementation is relevant for the theoretical development and practical use of the PID framework.
From a theoretical point of view, this implementation will provide the possibility to test the properties of the PID beyond the bivariate case. This is critical with regard to the nonnegativity property because, while nonnegativity is guaranteed in the bivariate case, for the multivariate case it has been proven that negative terms can appear in the presence of deterministic dependencies [9, 57, 15]. However, the violation of nonnegativity has only been proven with isolated counterexamples and it is not understood which properties of a system’s dependencies lead to negative PID measures.
From a practical point of view, the trivariate PID allows studying new types of distributed information that only appear beyond the bivariate case, such as information that is redundant to two inputs and unique with respect to a third [78]. This extension is significant both to directly study multivariate systems, as well as to be exploited for data analysis [67, 11]. As mentioned above, the characterization of synergy and redundancy in multivariate systems is relevant for a broad range of fields that encompass social and biological systems. So far the PID has particularly found applications in neuroscience e.g. [64, 73, 30, 56, 55, 26]. For data analysis, the quantification of multivariate redundancy can be applied to dimensionality reduction [6] or to better understand how representations emerge in neural networks during learning [65, 62]. Altogether, this software promises to significantly contribute to the refinement of the information-theoretic tools it implements and also to foster its widespread application to analyze data from multivariate systems.
2 Models and software
The section starts by briefly describing the mathematical model of the problem. Then it discusses the architecture of the MaxEnt3D_Pid. It closes by explaining in details how to use the software.
2.1 Maximum Entropy Decomposition Measure
Consider and as the sources and as the target of some system. Let be the joint distribution of and be the mutual information of and where is any nonempty subset of . The PID decomposes into finer parts, namely, synergistic, unique, redundant unique, and redundant information These finer parts respect certain identities [78], e.g., a subset of them sums up to (All identities are explained in Appendices A and C.). Following the maximum entropy approach [10], to obtain this decomposition, it is needed to solve the following optimization problems
[TABLE]
where
[TABLE]
and is the set of all joint distributions of . The four minimization problems in (1) can be formulated as exponential cone programs, a special of convex optimization. The authors refer to [48] for a nutshell introduction to Cone programs, in particular, the exponential ones. The full details on how to formulate (1) as exponential cone programs and their convergence properties are explained in [47, Chapter 5].
The MaxEnt3D_Pid on its own returns the synergistic information and unique information collectively. In addition, with the help of the bivariate solver [46] (used in a specific way) the finer synergistic and unique information can also be extracted. Hence, the presented model obtains all the trivariate PID quantities. The full details for recovering the finer parts can be found in Appendices C and D.
2.2 Software Architecture and Functionality
MaxEnt3D_Pid is implemented using the standard Python syntax. The module uses an optimization software ECOS [22] to solve several optimization problems needed to compute the trivariate PID. To install the module, ECOS python package has to be installed [21] and then from the GitHub repository the files MAXENT3D_PID.py, TRIVARIATE_SYN.py, TRIVARIATE_UNQ.py, and TRIVARIATE_QP.py must be downloaded [49].
MaxEnt3D_Pid has two python classes Solve_w_ECOS and QP. Class Solve_w_ECOS receives the marginal distributions of , , and as python dictionaries. These distributions are used by Solve_w_ECOS sub-classes Opt_I and Opt_II to solve the optimization problems of (1a) and (1b) respectively. The class QP is used to recover the solution of any optimization problems of (1) when Solve_w_ECOS fails to obtain a solution with a good quality. Figure 1 gives an overview of how these two classes interact.
2.2.1 The Subclass Opt_I and Opt_II
The sub-classes Opt_I and Opt_II formulate the problems (1), use ECOS to get the optimal values, and compute their violations of the optimality certificates. They return the optimal values and their optimality violations. These violations are quality measures of the obtained PID. Figure 1 describes this process within the class Solve_w_ECOS. Note that both sub-classes Opt_I and Opt_II optimize conditional entropy functionals, however, the different number of arguments leads to a difference in how to fit the problems into the cone program and retrieving the optimal solution. Hence, the requirement of splitting them into different classes.
2.2.2 The Class QP
Class QP acts if Solve_w_ECOS returns a values of a subset of (1) with high optimality violations. It improves the errand values by best fitting them using Quadratic Programming where the PID identities (13) are respected.
2.3 Using MaxEnt3D_Pid
The process of computing the PID is packed in the function pid(). This function takes as input the distribution of via a python dictionary where the tuples are keys and their associated probability is the value of the key, see Figure 2. The function formulates and solves the problems of (1) using Solve_w_ECOS, and if needed uses QP to improve the solution. This function pid() returns a python dictionary, explained in Table 1 and Table 2, containing the PID of in addition to the optimality violations.
The function pid() has three other optional inputs. The first optional input is called parallel (default value is parallel=’off’) which determines whether the process will be parallelized. If parallel=’off’, then the process is going to be done sequentially, i.e., the four problems of (1) are going to be formulated and solved one after the other. Their optimality violations are also computed consecutively, and then final results are obtained. Whereas, when parallel=’on’, the formulation of the four problems (1) is done in parallel. The four problems are solved simultaneously, and finally the optimality violations along with the final results are computed in parallel. Thus, when parallel=’on’ there will be three sequential steps: formulating the problems, solving them, and obtaining the final results as opposed to parallel=’off’ which requires at least twelve sequential steps.
The second optional input is a dictionary which allows the user to tune the tolerances controlling the optimization routines of ECOS listed in Table 3.
In this dictionary, the user only sets the parameters that will be tuned. For example, if the user wants to achieve high accuracy, then the parameters abstol and reltol should be small (e.g. ) and the parameter max_iter should be high (e.g. 1000). In Figure 3, it is shown how to modify the parameters. In this case the solver will take longer to return the solution. For further details about parameter’s tuning, check [48].
The third optional input is called output and it controls what will pid() print on the user’s screen. This optional input is explained in Table 4.
3 Illustrations
This section shows some performance tests of MaxEnt3D_Pid on three types of instances. We will describe each type of instances and show the results of testing MaxEnt3D_Pid for each one of them. The first two types, paradigmatic and Copy gates, are used as validation and memory tests. The last type, random probability distributions, is used to evaluate the accuracy and efficiency of MaxEnt3D_Pid in computing the trivariate partial information decomposition. The machine used comes with Intel(R) Core(TM) i7-4790K CPU (4 cores) and 16GB of RAM. Only the computations of the last type were done using parallelization.
3.1 Paradigmatic Gates
As a first test, we use some trivariate PIDs that are known and have been studied previously [32]. These examples are the logic gates collected in Table 5. For these examples the decomposition can be derived analytically and thus they serve to check the numerical estimations.
3.1.1 Testing
The test is implemented in test_gates.py. MaxEnt3D_Pid returns, for all gates, the same values as [[]Table 1]griffith2014quantifying up to a precision error of order . The slowest solving time (not in parallel) is 1 millisecond.
3.2 Copy Gate
As a second test, we use the Copy gate example to examine the simulation of large systems. We show how the solver handles large systems in terms of speed and reliability.
The Copy gate is the mapping of , chosen uniformly at random, to where The size of the joint distribution of scales as where In our test, and where .
Since and are independent, it is easy to see that the only nonzero quantities are for .
3.2.1 Testing
The test is implemented in test_copy_gate.py. The slowest solving time was 100 sec and the worst deviation from the actual values was .
3.3 Random Probability Distributions
As a last example we use joint distributions of sampled uniformly at random over the probability space, to test the accuracy of the solver. The size of , , and is fixed to 2 whereas varies in . For each 500 joint distributions of are sampled.
3.3.1 Testing
As increases, the average value of and of decrease while that of increases. In Figure 6, the accuracy of the optimization is reflected in the low divergence from zero obtained for the unique information and . In Figure 7, it the time has a constant trend and the highest time value recorded is sec.
4 Summary and discussion
In this work we presented MaxEnt3D_Pid, a python module that computes a trivariate decomposition based on the Partial Information Decomposition (PID) framework of [78], in particular following the maximum entropy PID of [13] and exploiting the connection with the bivariate decompositions associated with the trivariate one [14]. This is, to our knowledge, the first available implementation extending the maximum-entropy PID framework beyond the bivariate case [46, 5, 48, 37].
The PID framework allows decomposing the information that a group of input variables has about a target variable into redundant, unique, and synergistic components. For the bivariate case, this results in a decomposition with four components, quantifying the redundancy, synergy, and unique information of each of the two inputs. In the multivariate case, finer parts appear which do not correspond to purely redundant or unique components. For example, the redundancy components of the multivariate decomposition can be interpreted based on local unfoldings when a new input is added, with each redundancy component unfolding into a component also redundant with the new variable and a component of unique redundancy with respect to it [13]. The PID analysis can qualitatively characterize the distribution of information beyond the standard mutual information measures [36] and has already been proven useful to study information in multivariate systems e.g. [44, 74, 4, 36, 55, 56, 41, 29, 18, 63].
However, the definition of suited measures to quantify synergy and redundancy is still a subject of debate. From all proposed PID measures, the maximum entropy measures [10] have a preeminent role in the bivariate case because they provide bounds to any other alternative measures that share fundamental properties related to the notions of redundancy and unique information. [13] generalized the maximum entropy approach proposing multivariate definitions of redundant information and showing that these measures implement the local unfolding of redundancy via hierarchically related maximum entropy constraints. The package MaxEnt3D_Pid efficiently implements the constrained information minimization operations involved in the calculation of the trivariate maximum-entropy PID decomposition. In Section 2, we described the architecture of the software, presented in details the main function of the software that computes the PID along with its optional inputs, and described how to use it. In Section 3, we provided examples which verified that the software produces correct results on paradigmatic gates, showed how the software scales with large systems, and reflected the accuracy of the software in estimating PID.
The possibility to calculate a trivariate decomposition of the mutual information represents a qualitative extension of the PID framework that goes beyond an incremental extension of the bivariate case, both regarding its theoretical development and its applicability. From a theoretical point of view, regarding the maximum-entropy approach, the multivariate case requires the introduction of new types of constraints in the information minimization that do not appear in the bivariate case [13, and Section 2]. More generally, the trivariate decomposition allows further studying one of the key unsolved issues in the PID formulation, namely the requirement of nonnegativity of the PID measures in the multivariate case.
In particular, [33] indicated that the original measure proposed by [78] only quantifies common amounts of information, and required new properties for the PID measures, to quantify qualitatively and not quantitatively how information is distributed. However, for the multivariate case these properties have been proven to be incompatible with guaranteeing nonnegativity, by using some counterexamples [9, 57, 15]. This led some subsequent proposals to define PID measures that either focus on the bivariate case [33, 10] or do not require nonnegativity [35, 27]. A multivariate formulation is desirable because the notions of synergy and redundancy are not restrained to the bivariate case, while nonnegativity is required for an interpretation of the measures in terms of information communication [17] and not only as a statistical description of the probability distributions. MaxEnt3D_Pid will allow systematically exploring when negative terms appear, beyond the currently studied isolated counterexamples. Furthermore, it has been shown that in those counterexamples negative terms result from the criterion used to assign identity to different pieces of information when deterministic relations exist [15]. Therefore, a systematic analysis of the appearance of negative terms will provide a better understanding of how information identity is assigned when quantifying redundancy, which is fundamental to assess how the PID measures conform to the corresponding underlying concepts.
From a practical point of view, the trivariate decomposition allows studying qualitatively new types of distributed information, identifying finer parts of the information that the inputs have about the target, such as information that is redundant to two inputs and unique with respect to a third [78]. This is particularly useful when examining multivariate representations, such as the interactions between several genes [2, 23] or characterizing the nature of coding in neural populations [52, 53]. Furthermore, exploiting the connection between the bivariates and the trivariate decomposition due to the invariance of redundancy to context [14], MaxEnt3D_Pid also allows estimating the finer parts of the synergy component (Appendix D). This also offers a substantial extension in the applicability of the PID framework, in particular for the study of dynamical systems [24, 16]. In particular, a question that requires a trivariate decomposition is how information transfer is distributed among multivariate dynamic processes. Information transfer is commonly quantified with the measure called transfer entropy [61, 71, 34, 70], which calculates the conditional mutual information between the current state of a certain process and the past of another process , given the past of and of any other processes that may also influence those two. In this case, by construction, the PID analysis should operate with three inputs corresponding to the pasts of , , and . Transfer entropy is widely applied to study information flows between brain areas to characterize dynamic functional connectivity [68, 76, 75], and characterizing the synergy, redundancy, and unique information of these flows can provide further information about the degree of integration or segregation across brain areas [20].
More generally, the availability of a software implementing the maximum entropy PID framework beyond the bivariate case, promises to be useful in a wide range of fields in which interactions in multivariate systems are relevant, spanning the domain of social [28, 19] and biological sciences [23, 66, 12, 56]. Furthermore, the PID measures can also be used as a tool for data analysis and to characterize computational models. This comprises dimensionality reduction via synergy or redundancy minimization [69, 6], the study of generative networks that emerge from information maximization constraints [43, 8], or explaining the representations in deep networks [62].
The MaxEnt3D_Pid package presents several differences and advantages with respect to other software packages currently available to implement the PID framework. Regarding the maximum-entropy approach, other packages only compute bivariate decompositions [46, 5, 48, 37]. The dit package [37] also implements several other PID measures, including bivariate implementations for the measure of [39] and [33]. Among the multivariate decompositions, the ones using the measures [78] or [7] can readily be calculated with standard estimators of the mutual information. However, the former, as discussed above, only quantifies common amounts of information, while the latter is only valid for a certain type of data, namely multivariate gaussian distributed. Software to estimate multivariate pointwise PIDs is also available [35, 27, 45] 111 The Ince software is located at https://github.com/robince/partial-info-decomp and Finn and Lizier software at http://jlizier.github.io/jidt/.. However, as mentioned above, these measures by construction allow negative components, which may not be desirable for the interpretation of the decomposition and limits their applicability for data analysis [6]. Altogether, MaxEnt3D_Pid is the first software that implements the mutual information PID framework via hierarchically related maximum entropy constraints, extending the bivariate case by efficiently computing the trivariate PID measures.
Computational details
The results in this paper were obtained using Python 3.6.7 and the conic solver ECOS 2.0.4. Python and all its packages are available at https://www.python.org/.
Acknowledgments
This research was supported by the Estonian Research Council, ETAG (Eesti Teadusagentuur), through PUT Exploratory Grant #620. D.C. was supported by the Fondation Bertarelli. R.V. also thanks the financial support from ETAG through the personal research grant PUT1476. We also gratefully acknowledge funding by the European Regional Development Fund through the Estonian Center of Excellence in IT, EXCITE.
Appendix A Williams-Beer PID Framework
In order to decompose where is the target and are the sources. [78] defined a set of axioms leading to what is known as the redundancy lattice (Figure 8). These axioms and lattice form the framework for partial information decomposition (PID) upon which all the exiting definitions of PID are formulated.
A.1 Williams-Beer Axioms
Suppose that a source is a subset of and a collection is a set of sources. A shorthand notation inspired by [13] will be used to represent the collection of sources, for example, if the system is then the collection of sources will be denoted as . [78] defined the following axioms that redundancy should comply:
- •
Symmetry (S): is invariant to the order of the sources in the collection.
- •
Self-redundancy (SR): The redundancy of a collection formed by a single source is equal to the mutual information of that source.
- •
Monotonicity (M): Adding sources to a collection can only decrease the redundancy of the resulting collection, and redundancy is kept constant when adding a superset of any of the existing sources.
A.2 The Redundancy Lattice
[78] defined a lattice formed from the collections of sources. They used (M) to define the partial ordering between the collections. The axiom (S) reflects the fact that each atom of the lattice will represent a partial information decomposition quantity. More importantly, not all the collections of sources will be considered as atoms since adding a superset of any source to the examined system does not change redundancy, i.e. , (M). The set of collections of sources included in the lattice which will form its atoms is defined as:
[TABLE]
where is the power set of . For this set of collections (atoms), the partial ordering relation that construct the Redundancy lattice is
[TABLE]
i.e. , for two collections and , if for each source in there is a source in that is a subset of that source. In Figure 8, the bivariate and trivariate redundancy lattices are shown.
A.3 Defining PID over the Redundancy lattice
The mutual information decomposition was constructed in [78] by implicitly defining partial information measures associated with each node of the redundancy lattice (Figure 8), such that the redundancy measures are obtained as
[TABLE]
where refers to the set of collections lower than or equal to in the partial ordering, and hence reachable descending from in the lattice .
Appendix B Bivariate Partial Information Decomposition
Let be the target random variable, and be the two source random variables, and be the joint probability distribution of . The PID captures the synergistic, unique, and redundant information as follows:
- •
The synergistic information between and about , namely, .
- •
The redundant information of and about , namely, .
- •
The unique information of about , namely, .
- •
The unique information of about , namely, .
This decomposition, using Beer-Williams axioms, yields these identities:
[TABLE]
Given the generic structure of the PID framework, [10] (BROJA) defined PID measures considering the following polytope:
[TABLE]
where is the set of all joint distributions of . [10] (BROJA) used the maximum entropy decomposition over in order to quantify the above quantities. Moreover, BROJA assumed that the following assumptions holds.
Assumption B.1** (Lemma 3 [10]).**
On the bivariate redundancy lattice (Figure 8), the following assumptions must hold to quantify the PID
All partial information measures of the redundancy lattice are nonnegative. 2. 2.
The terms and are constant on . 3. 3.
The synergistic term, namely, vanishes on upon minimizing the mutual information .
Under the above assumptions and using maximal entropy decomposition, BROJA defined the following optimization problems that computes the PID quantities.
[TABLE]
where is the co-information of and defined as . Note that [13] proved that (7c) is equivalent to
[TABLE]
B.1 Mutual Information over Bivariate Redundancy Lattice
This subsection writes down some mutual information quantities in terms of redundancy lattice partial information measures using (4). These formulas will be used in the following subsection to verify that the measures defined in (7) quantify the desired partial information quantities. will be the sum of partial information measure on every node of as follows:
[TABLE]
The mutual information of one source and the target are expressed as
[TABLE]
The mutual information of one sources and the target conditioned on knowing the other source are expressed as
[TABLE]
The co-information is expressed as
[TABLE]
B.2 Verification of BROJA Optimization
This subsection will verify that the measures defined in (7) quantify the desired partial information quantities under the maximum decomposition principle. Under assumptions B.1, the following statements are valid
- •
.
- •
and .
So, it is easy to see that
[TABLE]
Now, implies that , thus
[TABLE]
Hence, under assumptions B.1,
[TABLE]
Appendix C Maximum Entropy Decomposition of Trivariate PID
Let be the target random variable and be the source random variables and be the joint probability distribution of . [13] using maximum entropy decomposes mutual information into: Synergistic, unique, unique redundant, and redundant information. In this decomposition,
- •
the synergistic quantity, , captures the sum of all individual synergistic terms, namely, ,
- •
the unique information, , captures the sum of the information that has about solely, , and the information knows redundantly with the synergy of , for all ,
- •
the unique redundant information, , captures the actual unique information that and has redundantly about , for all ,
- •
and the redundant information, captures the actual redundant information of and about , i.e, .
Using Beer-Williams axioms the decomposition yields these identities:
[TABLE]
and is the set of all joint distributions of . The measure uses the maximum entropy decomposition over in order to compute the above quantities. Moreover, [13] assumes some assumptions over the partial information measures of the redundancy lattice.
Assumption C.1** (Assumption a.1 and Assumption a.2 in [13]).**
On the trivariate redundancy lattice (Figure 8), the following assumptions are made to quantify the PID
All partial information measures of the redundancy lattice are nonnegative. 2. 2.
The terms and for all are invariant on . 3. 3.
The summands for all are invariant on . 4. 4.
The terms and for all are not constant on . 5. 5.
All synergistic terms, and for all vanishes at the minimum over . 6. 6.
The partial information measures for all vanishes at the minimum over .
Under the above assumptions and using maximal entropy decomposition, [13] defines the following optimization problems that compute the PID quantities.
[TABLE]
where
[TABLE]
C.1 Mutual Information Over the Trivariate Redundancy Lattice
This subsection writes down some mutual information quantities in terms of the trivariate redundancy lattice’s partial information measures using (4). The verification that the optimization defined in (14) quantifies the desired partial information quantities is discussed in details by [13] and so will be skipped. But these formulas are needed later when discussing how to compute the individual PID terms using a hierarchy of BROJA and [13] PID decompositions. The mutual information quantities in terms of redundancy lattice partial information measures.
will be the sum of partial information measure on every node of the redundancy lattice as follows.
[TABLE]
The mutual information of two sources (jointly) and the target are expressed as
[TABLE]
The mutual information of one source and the target are expressed as
[TABLE]
The mutual information of two sources (jointly) and the target conditioned on knowing the other source are expressed as
[TABLE]
The mutual information of one sources and the target conditioned on knowing only one of the other sources are expressed as
[TABLE]
The mutual information of one sources and the target conditioned on knowing the other sources are expressed as
[TABLE]
The co-information of two sources and the target are expressed as
[TABLE]
The co-information of one sources, two sources (jointly), and the target are expressed as
[TABLE]
The co-information of two sources (jointly), two sources (jointly), and the target are expressed as
[TABLE]
The co-information of two sources and the target conditioning on knowing the other source are expressed as
[TABLE]
Appendix D Separating Trivariate PID quantities of Maximum Entropy Decomposition PID
In Appendix C, the maximum entropy decomposition for trivariate PID returns a synergistic term which is the sum of all individual synergy quantities and unique term which the sum of unique and unique redundancy quantities. This section aims to show how to use maximum entropy decomposition for bivariate PID in order to obtain each individual synergy quantity as well as each individual unique and unique redundancy quantity.
Let be the target random variable and be the source random variables and be the joint probability distribution of . Now BROJA will be applied to some subsystems of , namely, (One Singled source) and (Two Double sources) for all . Consider the following probability polytopes upon which the optimization will be carried
[TABLE]
Note that for all .
D.1 One Singled Source Subsystems
These subsystems has the form where , and . Now apply the BROJA decomposition to the subsystem . So its four PID quantities are defined as follows:
[TABLE]
Note that the marginal distribution is fixed. This implies that the mutual information and are invariant over . Therefore, the summands are fixed. But from assumption C.1 and the fact that marginal is fixed, the redundancy is invariant over . Thus, in addition to assumption 2, the following partial information measures are invariant over
since the marginal is fixed. 2. 2.
since and are invariant over . 3. 3.
since and are invariant over .
Thus, using assumptions C.1 and definition of over the redundancy lattice,
[TABLE]
The synergy is evaluate as
[TABLE]
The unique information of is evaluated as
[TABLE]
The unique information of is evaluated as
[TABLE]
When then
[TABLE]
The shared information of and is evaluated as
[TABLE]
Hence the BROJA decomposition of the subsystem is
[TABLE]
Whence the BROJA decompositions of the subsystems and are
[TABLE]
and
[TABLE]
D.2 Two Double Sources Subsystems
These subsystems has the form where and . Now apply the BROJA decomposition to the subsystem . So its four PID quantities are defined as follows
[TABLE]
Note that the and marginal distributions are fixed. Then, and are invariant over for and . Therefore, and are fixed. But from the assumption C.1 and the two fixed and marginals, then the redundancies and are invariant over . Therefore, in addition to assumption 2, the following partial information measures are invariant
since marginal is fixed. 2. 2.
since marginal is fixed. 3. 3.
since and are invariant. 4. 4.
since and are invariant over 5. 5.
since and are invariant over 6. 6.
since and are invariant over 7. 7.
since and are invariant over
Thus, using assumptions C.1 and definition of over the redundancy lattice,
[TABLE]
The synergy is evaluate as
[TABLE]
The unique information of is evaluated as
[TABLE]
The unique information of is evaluated as
[TABLE]
When then
[TABLE]
The shared information of and is evaluated as
[TABLE]
Hence then BROJA decomposition of the subsystem is
[TABLE]
Whence the BROJA decomposition of the subsystem is
[TABLE]
and that of is
[TABLE]
D.3 Synergy of Three Double Sources System
Consider the system of the form . The sources here are called composite since they are compositions of the primary sources and . [13] measures using maximum entropy decomposition (14) can be used to capture the synergy of composite sources but do not break down contributions that involve unique or redundancy of composite sources. The optimization is taken over the polytope:
[TABLE]
In this polytope, and are invariant for all . Therefore, in addition to assumption 2, the following partial information measures are invariant
since marginal is fixed. 2. 2.
since marginal is fixed. 3. 3.
since marginal is fixed. 4. 4.
since and marginals are fixed. 5. 5.
since and marginals are fixed. 6. 6.
since and marginals are fixed. 7. 7.
since and marginals are fixed. 8. 8.
since and are invariant over 9. 9.
since and are invariant over 10. 10.
since and are invariant over 11. 11.
since , , , , and are invariant over 12. 12.
since , , , , and are invariant over 13. 13.
since , , , , and are invariant over
Hence the only partial information measure which is not fixed is and
[TABLE]
The synergy is evaluate as
[TABLE]
D.4 Computing the Finest parts of the Trivariate PID
The values of and can be extracted from the unique information of the subsystems of the form and of the system .
The synergy of the system , the synergy of the system , the synergy of the subsystems of the form , and the synergy of the subsystems of the form construct the following system of equations which allows to recover the individual synergistic quantities,
[TABLE]
Therefore, to compute the trivariate PID quantities then a hierarchy of a maximum entropy trivraite PID (Appendix C), six bivariate PID (Appendices D.1 and D.2), and a single optimization should be computed (Appendix D.3). This hierarchy is scripted at the MaxEnt3D_Pid GitHub in the file test_trivariate_finer_parts.py.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] S Amari. Information geometry on hierarchy of probability distributions. IEEE Transactions on Information Theory , 47(5):1701–1711, 2001.
- 2[2] D. Anastassiou. Computational analysis of the synergy among multiple interacting genes. Molecular Systems Biology , 3:83, 2007.
- 3[3] N. Ay, R. Der, and M. Prokopenko. Information-driven self-organization : The dynamical system approach to autonomous robot behavior. Theory in Biosciences , 131(3):125–127, 2012.
- 4[4] P. K. Banerjee and V. Griffith. Synergy, redundancy, and common information. arxiv:1509.03706 v 1 , 2015.
- 5[5] P. K. Banerjee, J. Rauh, and G. Montúfar. Computing the unique information. ar Xiv:1709.07487 v 2 , 2018.
- 6[6] P. K. R. Banerjee and G. Montúfar. The variational deficiency bottleneck. ar Xiv , page ar Xiv:1810.11677, 2018.
- 7[7] A. B. Barrett. Exploration of synergistic and redundant information sharing in static and dynamical gaussian systems. Physical Review E , 91:052802, 2015.
- 8[8] J. A. Bell and T. J. Sejnowski. An information maximisation approach to blind separation and blind deconvolution. Neural Computation , 7(6):1129–1159, 1995.
