Factorization of Dempster-Shafer Belief Functions Based on Data
Andrzej Matuszewski, Mieczys{\l}aw A. K{\l}opotek

TL;DR
This paper introduces a new measure within Dempster-Shafer Theory that enables statistical testing of belief function independence, overcoming previous difficulties caused by negative belief values.
Contribution
A novel measure F is proposed that relates to conditional independence in DST, facilitating the use of standard statistical tests for dependence detection.
Findings
The measure F allows testing independence in belief functions.
Negative belief values no longer hinder statistical analysis.
The approach bridges DST with conventional statistical methods.
Abstract
One important obstacle in applying Dempster-Shafer Theory (DST) is its relationship to frequencies. In particular, there exist serious difficulties in finding factorizations of belief functions from data. In probability theory factorizations are usually related to notion of (conditional) independence and their possibility tested accordingly. However, in DST conditional belief distributions prove to be non-proper belief functions (that is ones connected with negative "frequencies"). This makes statistical testing of potential conditional independencies practically impossible, as no coherent interpretation could be found so far for negative belief function values. In this paper a novel attempt is made to overcome this difficulty. In the proposal no conditional beliefs are calculated, but instead a new measure F is introduced within the framework of DST, closely related to conditional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Multi-Criteria Decision Making · Statistical Distribution Estimation and Applications
MethodsDynamic Sparse Training
Factorization of Dempster-Shafer Belief Functions Based on Data
Mieczysław A. Kłopotek
Andrzej Matuszewski, Mieczysław A. Kłopotek
(Warszawa, November 1995)
Factorization of Dempster-Shafer Belief Functions Based on Data
[TABLE]
Warszawa, November 1995
ABSTRACT
One important obstacle in applying Dempster-Shafer Theory (DST) is its relationship to frequencies. In particular, there exist serious difficulties in finding factorizations of belief functions from data. In probability theory factorizations are usually related to notion of (conditional) independence and their possibility tested accordingly. However, in DST conditional belief distributions prove to be non-proper belief functions (that is ones connected with negative ”frequencies”). This makes statistical testing of potential conditional independencies practically impossible, as no coherent interpretation could be found so far for negative belief function values. In this paper a novel attempt is made to overcome this difficulty. In the proposal no conditional beliefs are calculated, but instead a new measure F is introduced within the framework of DST, closely related to conditional independence, allowing to apply conventional statistical tests for detection of dependence/independence.
1 Introduction
The Dempster-Shafer (DS) Theory (DST) or the Theory of Evidence is considered by many researchers as an appropriate tool to represent various aspects of human dealing with uncertain knowledge, especially for representation of partial ignorance.
However, one particular obstacle in applying DST is its relationship to frequencies [12]. Though, in general a belief function may be derived from frequencies under some particular database representation [5], there exist serious difficulties in finding factorizations of belief functions from data.
In probability theory and in classical statistics the factorizations are usually related to notion of (conditional) independence and such possibility is tested accordingly. However, in DST conditional belief distributions prove to be non-proper belief functions (that is ones connected with negative ”frequencies”). This makes statistical testing of potential conditional independencies practically impossible, as no coherent interpretation could be found so far for negative belief function values.
In this paper a novel attempt is made to overcome mentioned difficulty in that no conditional beliefs are calculated, but instead a new measure F is introduced within the framework of DST, closely related to conditional independence, allowing to apply conventional statistical tests for detection of dependence/independence.
The paper is structured as follows: First, basic notions of DST are introduced. Then the problem with emerging negative beliefs is explained. The new F-measure is defined. The last section explains the way statistical tests may be used in connection with this F-measure.
2
Dempster Shafer Theory and the Concept of Conditional Independence
The Valuation Based Systems (VBS) framework, covering common concepts of probability theory, Dempster Shafer theory of evidence, to some extent also possibility theory, was introduced in [6]. In VBS, a domain knowledge is represented by entities called *variables * and valuations. Further, two operations called *combination * and *marginalization * are defined on valuations to perform a local computational method for computing marginals of the joint valuation. The basic components of VBS can be characterized as follows.
**Valuations
**
Let \mbox{{\cal X}}=\{X_{1},X_{2},...X_{n}\} be a finite set of variables and be the domain (called also frame), i.e. a discrete set of possible values of i-th variable. If h is a finite non-empty set of variables then denotes the Cartesian product of for in , i.e. . For each subset s of there is a set called the domain of a valuation. For instance in the case of probabilistic systems equals to , while under the belief function framework equals to the power set of , i.e. . Valuations, being primitives in the VBS framework, can be characterized as mappings \sigma:D(s)\rightarrow\mbox{{\cal R}} where stands for a set of non-negative reals. In the sequel non-specific valuations will be denoted by lower-case Greek letters, , , , and so on. The set of all valuations will be denoted by , wheras \mbox{{\cal V}}_{s} denotes the set of all valuations defined for the set of variables .
Within the Dempster-Shafer theory of evidence, valuation is either the mass function m, belief function Bel, plausibility function Pl or commonality function Q interchangeably. These functions can be uniquely computed one from another using the formulas:
[TABLE]
Following Shenoy [6] we distinguish three categories of valuations:
- •
Proper valuations, , represent knowledge that is partially coherent. (Coherent knowledge means knowledge that has well defined semantics.) This notions plays an important role in the theory of belief functions: by proper valuation it is understood a valuation in which everywhere .
- •
Normal valuations, , represent another kind of partially coherent knowledge. For instance, in Dempster-Shafer theory, a normal valuation is an m-function whose values sum to 1. Particularly, the elements of \mbox{{\cal P}}\cap\mbox{{\cal N}} are called proper normal valuations; they represent knowledge that is completely coherent or knowledge that has well-defined semantics. We speak about proper mass function, proper belief function, proper plausibility function and proper commonality function iff and .
- •
Positive normal valuations: it is a subset \mbox{{\cal U}}_{s} of \mbox{{\cal N}}_{s} consisting of all valuations that have unique identities in \mbox{{\cal N}}_{s}. For Dempster-Shafer theory this means .
Further there are two types of special valuations:
- •
Zero valuations represent knowledge that is internally inconsistent, i.e. knowledge whose truth value is always false; e.g., in Dempster-Shafer theory by zero valuation we understand a valuation that is identically zero, for every set A. It is assumed that for each s\subseteq\mbox{{\cal X}} there is at most one valuation \zeta_{s}\in\mbox{{\cal V}}_{s} . The set of all zero valuations is denoted by .
- •
Identity valuations, I, represent total ignorance, i.e. lack of knowledge. In Dempster-Shafer theory an identity valuation corresponds so-called vacuous valuation, where for every set A except for .
**Combination
**
By combination we understand a mapping \otimes:\mbox{{\cal V}}\times\mbox{{\cal V}}\rightarrow\mbox{{\cal N}}\cup\mbox{{\cal Z}} that satisfies the following six axioms:
(C1)
If \rho\in\mbox{{\cal V}}_{r} and \sigma\in\mbox{{\cal V}}_{s} then \rho\otimes\sigma\in\mbox{{\cal V}}_{r\cup s};
(C2)
;
(C3)
;
(C4)
If \rho\in\mbox{{\cal V}}_{r} and zero valuation exists then \rho\otimes\zeta_{s}\in\mbox{{\cal V}}_{r\cup s}.
(C5)
For each s\subseteq\mbox{{\cal X}} there exists an identity valuation \iota_{s}\in\mbox{{\cal N}}_{s}\cup\{\zeta_{s}\} such that for each valuation \sigma\in\mbox{{\cal N}}_{s}\cup\{\zeta_{s}\}, .
(C6)
It is assumed that the set \mbox{{\cal N}}_{\emptyset} consists of exactly one element denoted .
In practice combination of two valuations is implemented as follows. Let (+) be a binary operation on . Then where is an element from and , stand for the projection (relying upon dropping unnecessary variables) of onto the appropriate domain or . In Dempster-Shafer theory to the Dempster rule of combination. We say that we combine two mass functions , to obtain iff
[TABLE]
Combination of Bel, Pl, Q is the combination of the respective m function. It is worth mentioning that we can compute as
[TABLE]
In the field of uncertain reasoning combination corresponds to aggregation of knowledge: when and represent our knowledge about variables in subsets and of then the valuation represents the aggregated knowledge about variables in .
If is a zero valuation, we say that and are inconsistent. On the other hand, if is a normal valuation, then we say that and are consistent. Inconsistency in DST appears if
[TABLE]
**Marginalization
**
While combination results in knowledge expansion, marginalization results in knowledge contraction. Let be a non-empty subset of . It is assumed that for each variable X in there is a mapping \downarrow(s-\{X\}):\mbox{{\cal V}}_{s}\rightarrow\mbox{{\cal V}}_{s}-\{X\}, called marginalization to or deletion of , that satisfies the six axioms below:
(M1)
Suppose \sigma\in\mbox{{\cal V}}_{s} and suppose . Then
;
(M2)
If zero valuation exists, then ;
(M3)
\sigma^{\downarrow(s-{X})}\in\mbox{{\cal N}} if and only if \sigma\in\mbox{{\cal N}} ;
(M4)
If \sigma\in\mbox{{\cal U}} then \sigma^{\downarrow(s-{X})}\in\mbox{{\cal U}};
(CM1)
Suppose \rho\in\mbox{{\cal V}}_{r} and \sigma\in\mbox{{\cal V}}_{s}. Suppose and . Then
(CM2)
Suppose \sigma\in\mbox{{\cal N}}_{s}. Suppose and suppose that is an identity for . Then
.
Axiom M1 states that if we delete from s, the domain of a valuation \sigma\in\mbox{{\cal V}}_{s}, two variables, say and , then the resulting valuation defined over the subset is invariant with respect to the order of these variables deletion. Particularly, deleting all variables from the set s we obtain the valuation whose domain is the empty set (its existence is guaranteed by axiom C6); by axiom M3 this element equals to if and only if is a normal valuation.
Axioms M2 - M4 state that the marginalization preserves coherence of knowledge.
In the Dempster-Shafer theory marginalization means summing of masses along deleted dimensions:
[TABLE]
where marginalization of a set of vectors B onto a subset of variables p means the set of corresponding vectors projected onto subspace p.
**Removal
**
Removal, called also direct difference, is an ”inverse” operation to the combination. Formally, it can be defined as a mapping \mbox{\bigcirc\mbox{\scriptsize\rm R}}:\mbox{{\cal V}}\times(\mbox{{\cal N}}\cup\mbox{{\cal Z}})\rightarrow\mbox{{\cal N}}\cup\mbox{{\cal Z}}, that satisfies the three axioms:
(R1)
If \sigma\in\mbox{{\cal V}}_{s} and \rho\in\mbox{{\cal N}}_{r}\cup\mbox{{\cal Z}}_{r} then \sigma\mbox{\bigcirc\mbox{\scriptsize\rm R}}\rho\in\mbox{{\cal N}}_{r\cup s}\cup\mbox{{\cal Z}}_{r\cup s}.
(R2)
For each \rho\in\mbox{{\cal N}}_{r}\cup\mbox{{\cal Z}}_{r} and for each r\subseteq\mbox{{\cal X}} there exists an identity such that \rho\mbox{\bigcirc\mbox{\scriptsize\rm R}}\rho=\iota_{r} .
(CR)
If \sigma,\tau\in\mbox{{\cal V}} and \rho\in\mbox{{\cal N}}\cup\mbox{{\cal Z}} then (\sigma\otimes\tau)\mbox{\bigcirc\mbox{\scriptsize\rm R}}\rho=\sigma\otimes(\tau\mbox{\bigcirc\mbox{\scriptsize\rm R}}\rho).
Note that we can define the (pseudo)-inverse of a normal valuation by setting \rho^{-1}=\iota_{\emptyset}\mbox{\bigcirc\mbox{\scriptsize\rm R}}\rho.
In the Dempster-Shafer theory, the removal is defined (by Shenoy) as
[TABLE]
if and
[TABLE]
otherwise; where c is a normalization factor for Q.
He defined conditional independence as follows: Suppose , suppose r,s,v are disjoint subsets of w. We say that r and s are conditionally independent given v with respect to , written as iff there exist and such that
[TABLE]
In case of DST conditional independence of sets of variables r and s given v in belief function Bel means that there must exist (not necessarily proper and normal) ”belief functions” defined over and defined over such that
[TABLE]
Shenoy introduces also the notion of conditional valuations and particularly of conditional belief functions based on the notion of removal.
We say that is a DS belief function Bel conditioned on the set of variables p if
[TABLE]
.
Furthermore, we can easily derive the conclusion that In case of DST conditional independence of sets of variables r and s given v in belief function Bel means that
[TABLE]
Shenoy writes, however [6, pp.225-226]: ”Notice that if and are commonality functions, it is possible that may not be a commonality function because condition … [of non-negativity of mass function] may not be satisfied by In fact, if is a commonality function for s, and , then even may fail to be a commonality function. This fact is the reason why we need the concept of proper valuation as distinct from non-zero and normal valuations in the general VBS framework. An implication of this fact is that conditionals may lack semantic coherence in the Dempster-Shafer’s theory. This is the primary reason why conditionals are neither natural nor widely studied in the Dempster-Shafer’s belief-function theory”.
3 The Fundamental Problem of Testing Conditional Independence in DST
Dempster-Shafer theory of evidence has been frequently criticized for its unclear relation to frequencies [12] However, even if we have already agreed on a representational model for daatabase founded belief functions like that in [5] then we have still serious problems with search for conditional independence in a database.
First of all, as already stated by Shenoy (cited above), conditional belief functions are in general not coherent belief functions, hence it is impossible to formulate for them a counterpart in the world of frequencies.
Hence one can be tempted to test if the right hand side and the left hand side belief distributions in the formula
[TABLE]
agree, e.g. a an appropriate -test on agreement of cell frequencies of empirical Bel distribution and the ”theoretical” ”expected” distribution . But as this ”expected” distribution may contain pseudo-belief functions as components, then the whole distribution may also have negative cells and hence impossible to compare as ”expected frequency”.
One may be tempted to seek heuristically for two (proper normal) belief functions defined over and defined over such that
[TABLE]
so that the ”expected” distribution is ensured to be proper normal by the very coherence of both and . However, as can be seen from the example below, such may not exist at all.
Let us consider the belief function Bel in variables X,Y,Z having ranges: X:{p,q}, Y:{r,s,t}, Z:{a,b,c}. The belief distribution in X,Y,Z be:
[TABLE]
It is easily checked that
[TABLE]
Let and in variables X,Z and Y,Z be two proper belief functions (that is with non-negative m’s) such that . It cannot happen simultaneously that has any focal point such that = {a,b,c} and has any focal point such that = {a,b,c} because would have to have a focal point C such that = {a,b,c}, which is not the case.
The above fact, due to existing focal points, implies that EITHER must have focal points: {(p,a),(p,b)}, {(p,b),(p,c)}, {(q,a),(q,b)} and {(q,b),(q,c)}, OR must have focal points: {(r,a),(r,b)}, {(r,b),(r,c)}, {(s,a),(s,b),(t,a),(t,b)} and {(s,b),(s,c),(t,b),(t,c)}, OR BOTH.
Let us suppose that has in fact focal points: {(p,a),(p,b)}, {(p,b),(p,c)}, {(q,a),(q,b)} and {(q,b),(q,c)}. Then must have neither {(r,a),(r,b)} nor {(r,b),(r,c)}, as focal point, because then {(p,r,b)} would be a focal point of , which is not the case.
But then has to have the focal point {(r,a),(r,b), (r,c)}. Similarly, must have neither {(s,a),(s,b),(t,a),(t,b)} nor {(s,b),(s,c),(t,b),(t,c)}, as focal point, because then {(q,s,b), (q,t,b)} would be a focal point of , which is not the case.
But then has to have the focal point {(s,a),(s,b), (s,c), (t,a),(t,b), (t,c)}. Then, however, for the belief function we would have:
[TABLE]
which is not the case. In this way we arrive at a contradiction. We can reason by analogy reverting the roles of and . Hence it proves impossible to get the decomposition in terms of proper belief functions.
4 A Solution
We define a new measure, beside m,Bel,Pl and Q, for the Dempster-Shafer theory. Let r,s and v be three disjoint sets of variables. Let us restrict our considerations to only those Bel functions, for which focal points are of the form:
[TABLE]
.One can call therefore each r,s and v by the term ”dimension”.
What we are now interested in is the possibility of testing dependence or independence of r and s, and later whether the dependence statement is influenced by v. Given the relationship among r and s is influenced by v, we may be interested, assuming causality among r,s,v, whether v makes r and s independent.
(Unconditional) independence between r and s alone is trivial, solvable with traditional statistical methods, as negative mass values do not emerge in the process. The interesting case is that of three variables.
Let us define the function corresponding to a given belief function Bel as
[TABLE]
In an obvious way F measure differs significantly from the ordinary DST mejasures in that it is a mixture of the Q-measure along the v dimension while the m-measure along the r,s dimension. The function F is everywhere non-negative.
First of all we can test (main subject of the next section) whether v influences relationship among r and s. If the relationship among variables sets r,s are not influenced by v, then if the set R stems from space r, S from space s, and V from space v, then
[TABLE]
If the above equation is rejected, then conditional independence of r,s given v may be of interest. At this point we need to assume existence of causal relationship of r,s on v. If the variables sets r,s are independent given v, then if the set R stems from space r, S from space s, and V from space v, then
[TABLE]
where the dot stands for the dimensions which is simply summed up (marginalized in probabilistic sense).
Appropriate direct statistical tests are not subject of this paper, but we can derive from the next section a stepwise procedure to check for conditional independence. The above relationship suggests that we can test for independence given variable set v in that for every level of the variable set v we test independence of variable sets r and s. Notice that in terms of frequencies at the given level of v the objects (database records) counted in cells for different combinations of levels of variables r and s are different, though same objects may occur on different levels of variables v.
The concept of measure F allows for direct conditional independence testing for DST using known statistical procedures. In the subsequent section the details are described for the particular example of three variables: X (from r), Y (from (s) and Z (from v).
5 Database Evaluation of Three-dimensional Belief
Distributions
Assume that there are K non-zero values of . It means that for K sets in the domain of variable Z corresponding mass m is non-zero.
Having database with records which are representative, in the opinion of the researcher, one can perform the traditional statistical analysis. It should be stressed, however, that this analysis depends on database not only for practical, empirical data. There exist some aspects of statistical analysis which impose certain restrictions for simulated database either.
Now the problem is how to assess the structure of dependence between Z and two-dimensional belief distribution .
The ”realistic” sample sizes should be assured first. Sample size in the number of records which corresponds to a given value of . We suggest that each sample size should belong to the interval
[TABLE]
with being the actual number of cells for distribution .
not always equals the product of i.e. the product of the possible values of and . Some so called structural zeros can exist.
If for a given value of the number of records in database does not belong to the interval (6), then we propose to recode variable Z.
A basic tool for a statistical analysis is a hierarchical model corresponding to the frequencies of database records. We will test the accuracy of the following expression:
[TABLE]
where:
ln - natural logarithm,
E - (statistical) expectation
- database frequency of records having i-th value of , j-th value of and k-th value (or its superset) of .
i=1,2,…,, j=1,2,…,, k=1,2,…,K.
The configuration of -values of the right-hand side of the expression (7) have a meaningful interpretation. Generally the belief variables X and Y can be mutually dependent. The joint distribution of these two variables does not change, however, for different values of within the three-dimensional framework.
Parameters f and and the model as a whole fulfill the traditional statistical terminology. -values are ”contrasts” which means that all possible marginal sums of indexed ’s must equal zero.
Traditional way for checking the adequacy of (7) is through the statistic. -values are estimated for this purpose. Appropriate number of ”degrees of freedom” must be taken into account when calculating p-value of statistical significance.
Let us consider the following example. and have only 2 possible values each. has 4 values and there is no impossible combination of 3-dimensional discrete vectors, i.e. no structural zeros.
There are degrees of freedom at the beginning. We must subtract, however, 1 degree on behalf of the constant f and additionally
[TABLE]
degrees for subsequent indexed -values, taking into account the marginal restrictions.
The statistic has in our example 9 degrees of freedom. If there would be some impossible triples of , then 9 is diminished still: one degree for each of them.
Taking into account the ”realistic” sample sizes we can find a first assessment of the possibility to factorize with respect to its last component: .
We propose the general threshold for the p-values calculated for statistic, to be p=0.1.
If the actual p-value is smaller than 0.1, one can still be seeking a factorization by redefinition of the variable Z. The notion of standardized residual can be used for this purpose.
The statistic is the sum of squares of residuals of the form:
[TABLE]
where is fitted frequency.
To fit frequencies according to model (7), the standard statistical programs can be employed (e.g. Statistica [10], SPSS [9]).
Clearly those values of with the highest residuals can be starting points to redefinition of this variable. Taking into account matrix differences in estimated probabilities of for given value of one can obtain classification being a basis of a set of redefined variables .
New fulfill appropriateness of model (7), i.e. p-values calculated for restricted statistics are higher than 0.1.
Marginals , , , … can have disjoint sets of values or not. For the real data the criterion of meaningfulness must be taken into account when choosing among possible triples.
Problem of joining of sets of probabilities is considered in literature. Most recent publication of this kind is Consonni, Veronese [4].
We have assumed that it is still easier to prove the existence of differences in probabilities if certain elements of samples are in common. Once the distribution (X,Y,Z’) has passed the test for factorization, one must confirm it.
The problem of comparing the contingency tables which were generated by the samples having some elements in common is less addressed in the statistical literature. Only recently [11] the optimal variance for difference of two proportions was calculated.
Previously the same problem was considered from different point of view [3, 7]. The missing data was allowed in the matched (paired) experiment for proportions.
Additional aspects of the procedure just described can be found in [1, 2, 8].
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Agresti A.: A survey of exact inference for contingency tables, Statistical Sciences , 7, 131-177, 1992
- 2[2] Agresti A., Kim O.: Improved exact inference about conditional associations in three-way contingency tables, JASA , 90, No.430,632-639, 1995.
- 3[3] Choi S.C., Stablein D.M.: Practical tests for comparing two proportions with incomplete data, Applied Statistics , 31, 256-262, 1982.
- 4[4] Consonni G., Veronese P.: A Bayesian method for combining results from several binomial experiments, JASA , Vol. 90, No.431, 935-944, 1995
- 5[5] Kłopotek M.A.: Testumgebung für Entwicklung eines Beratungssystems auf der Basis der Mathematischen Theorie der Evidenz, Österreichische Zeitschrift für Statistik und Informatik (ZSI) 23 Heft 2, , 1994, pp. 157-180
- 6[6] Shenoy, P.P. (1994). Conditional independence in valuation-based systems, International Journal of Approximate Reasoning, 10:203-234.
- 7[7] Shih W.J.: Maximum likelihood estimation and likelihood test with incomplete pairs, Journal of Statistical Computation and Simulation , 21, 187-194, 1985.
- 8[8] Silva Mato A., Martin Andres A.: Optimal unconditional tables for comparing two independent proportions, Biom. J. , 37 No.7, 821-836, 1995
