Confusion matrices and rough set data analysis

Ivo D\"untsch; G\"unther Gediga

arXiv:1902.01487·cs.LG·October 2, 2019

Confusion matrices and rough set data analysis

Ivo D\"untsch, G\"unther Gediga

PDF

TL;DR

This paper explores the use of confusion matrices within the rough set data model to evaluate classifiers without relying on distributional assumptions, introducing new indices and classifiers based on rough confusion matrices.

Contribution

It introduces a novel approach combining confusion matrices with rough set theory to assess classifier quality without distributional assumptions.

Findings

01

Defined indices based on rough confusion matrices

02

Developed classifiers using rough set data analysis

03

Provided a framework for classifier evaluation without distribution assumptions

Abstract

A widespread approach in machine learning to evaluate the quality of a classifier is to cross -- classify predicted and actual decision classes in a confusion matrix, also called error matrix. A classification tool which does not assume distributional parameters but only information contained in the data is based on the rough set data model which assumes that knowledge is given only up to a certain granularity. Using this assumption and the technique of confusion matrices, we define various indices and classifiers based on rough confusion matrices.

Tables3

Table 1. Table 1: A 2–class confusion matrix

		True value
		$P$	$N$
		True	False
Predicted value	$\hat{P}$	Positive	Positive
		False	True
	$\hat{N}$	Negative	Negative

Table 2. Table 3: A decision system

Type	Price	Guarantee	Sound	Screen	d
1	high	24 months	Stereo	76	high
2	low	6 months	Mono	66	low
3	low	12 months	Stereo	36	low
4	medium	12 months	Stereo	51	high
5	medium	18 months	Stereo	51	high
6	high	12 months	Stereo	51	low

Table 3. Table 5: The granule freq. matrix

\begin{matrix} Y_{1} & Y_{2} & Sum \\ X_{1} & 1 & 1 & 2 \\ X_{2} & 0 & 1 & 1 \\ X_{3} & 0 & 1 & 1 \\ X_{4} & 2 & 0 & 2 \\ Sum & 3 & 3 & 6 \end{matrix}

Equations35

Low_{X} (Y)

Low_{X} (Y)

Upp_{X} (Y)

n_{i} := ∣ Y_{i} ∣, n l_{i} := ∣ Low (Y_{i})∣, n u_{i} := ∣ Upp (Y_{i})∣ .

n_{i} := ∣ Y_{i} ∣, n l_{i} := ∣ Low (Y_{i})∣, n u_{i} := ∣ Upp (Y_{i})∣ .

I n d (b) := {0, 1, if b = 0, otherwise,

I n d (b) := {0, 1, if b = 0, otherwise,

\hat{Y}_{i} := ⋃ f^{- 1} (Y_{i}) = ⋃ {X_{s} : f (X_{s}) = Y_{i}} .

\hat{Y}_{i} := ⋃ f^{- 1} (Y_{i}) = ⋃ {X_{s} : f (X_{s}) = Y_{i}} .

n_{ij}

n_{ij}

X_{1} = {1, 6}, X_{2} = {2}, X_{3} = {3}, X_{4} = {4, 5},

X_{1} = {1, 6}, X_{2} = {2}, X_{3} = {3}, X_{4} = {4, 5},

Y_{1} = {1, 4, 5}, Y_{2} = {2, 3, 6} .

Y_{1} = {1, 4, 5}, Y_{2} = {2, 3, 6} .

\begin{array}[]{cccc}&Y_{1}&Y_{2}&\text{Sum}\\ Y_{1}&1&1&2\\ Y_{2}&0&1&1\\ Y_{2}&0&1&1\\ Y_{1}&2&0&2\\ \text{Sum}&3&3&6\end{array}

\begin{array}[]{cccc}&Y_{1}&Y_{2}&\text{Sum}\\ Y_{1}&1&1&2\\ Y_{2}&0&1&1\\ Y_{2}&0&1&1\\ Y_{1}&2&0&2\\ \text{Sum}&3&3&6\end{array}

\begin{array}[]{ccccc}&Y_{1}&Y_{2}&\text{Sum}\\ \hat{Y}_{1}&3&1&4\\ \hat{Y}_{2}&0&2&2\\ \text{Sum}&3&3&6\end{array}

\begin{array}[]{ccccc}&Y_{1}&Y_{2}&\text{Sum}\\ \hat{Y}_{1}&3&1&4\\ \hat{Y}_{2}&0&2&2\\ \text{Sum}&3&3&6\end{array}

γ = i = 1 \sum k \frac{n _{i}}{n} \cdot p_{i},

γ = i = 1 \sum k \frac{n _{i}}{n} \cdot p_{i},

α_{i} = \frac{n l _{i}}{n u _{i}} = p_{i} \cdot p^{i} .

α_{i} = \frac{n l _{i}}{n u _{i}} = p_{i} \cdot p^{i} .

α := \frac{\sum _{i} ( n _{i ∙} + n _{∙ i} ) \cdot α _{i}}{\sum _{i} n _{i ∙} + n _{∙ i} - n _{ii}}

α := \frac{\sum _{i} ( n _{i ∙} + n _{∙ i} ) \cdot α _{i}}{\sum _{i} n _{i ∙} + n _{∙ i} - n _{ii}}

α = \frac{\sum _{i} ( n _{i ∙} + n _{∙ i} ) \cdot α _{i}}{\sum _{i} n _{i ∙} + n _{∙ i} - n _{ii}} = \frac{\sum _{i} n _{ii}}{\sum _{i} n _{i ∙} + n _{∙ i} - n _{ii}} = \frac{γ}{2 - γ} .

α = \frac{\sum _{i} ( n _{i ∙} + n _{∙ i} ) \cdot α _{i}}{\sum _{i} n _{i ∙} + n _{∙ i} - n _{ii}} = \frac{\sum _{i} n _{ii}}{\sum _{i} n _{i ∙} + n _{∙ i} - n _{ii}} = \frac{γ}{2 - γ} .

X_{i} \cap f (X_{i}) \neq = \emptyset.

X_{i} \cap f (X_{i}) \neq = \emptyset.

n l_{j} \leq n l_{j}^{**} \leq n l_{j}^{*} \leq ∣ Y_{j} ∣ .

n l_{j} \leq n l_{j}^{**} \leq n l_{j}^{*} \leq ∣ Y_{j} ∣ .

n u_{j}^{*} := n_{j j} + i \neq = j \sum (n_{ij} + n_{j i}) = i \neq = j \sum n_{ij} + i \sum n_{j i} .

n u_{j}^{*} := n_{j j} + i \neq = j \sum (n_{ij} + n_{j i}) = i \neq = j \sum n_{ij} + i \sum n_{j i} .

∣ Y_{j} ∣ \leq n u_{j}^{*} \leq n u_{j}^{**} \leq n u_{j} .

∣ Y_{j} ∣ \leq n u_{j}^{*} \leq n u_{j}^{**} \leq n u_{j} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Confusion matrices and rough set data analysis

Ivo Düntsch111The ordering of authors is alphabetical and equal authorship is implied. 222Permanent address: Dept. of Computer Science, Brock University, St Catharines, Canada 3

Günther Gediga ${}^{1~{}4}$

3 College of Mathematics and Informatics, Fujian Normal University, Fuzhou, China

4 Institut für Evaluation und Marktanalysen, Brinkstr. 19, 49143 Jeggen, Germany [email protected],[email protected]

Abstract

A widespread approach in machine learning to evaluate the quality of a classifier is to cross – classify predicted and actual decision classes in a confusion matrix, also called error matrix. A classification tool which does not assume distributional parameters but only information contained in the data is based on rough set data model which assumes that knowledge is given only up to a certain granularity. Using this assumption and the technique of confusion matrices, we define various indices and classifiers based on rough confusion matrices.

1 Introduction

In pattern recognition and other disciplines of machine learning, the sum of the diagonal elements of a confusion matrix is widely used to measure the success of a classification based on an algorithm or human observation in comparison with a gold standard (or “true” measurement) such as classification by an expert. The main idea is that an algorithm (or an observer) forms its own hidden equivalence classes of the data, and is forced to assign the classes to the categories given by the gold standard. The underlying model may be one of a plethora of existing techniques see e.g. [1, 2, 3]. The question may be asked, whether such an index is valid for determining the quality of a classifier: Since we approximate sets, namely, decision classes, one should use a theory of set approximation such as the rough set approach to investigate this question.

In a first step we find a connection of a rough set decision system and a resulting confusion matrix. We derive several approximations of upper and lower bounds of the classes given by the gold standard; additionally, we consider the standard indices of rough set analysis for the coverage. Owing to lack of space we shall only indicate the procedures, and detailed results and proofs will appear elsewhere.

2 Definitions and notation

Throughout, $U$ denotes a finite nonempty set with $n$ elements. Given a set $\mathcal{Y}=\{Y_{1},\ldots,Y_{k}\}$ of decision classes, a classifier is a mapping $f:U\to\mathcal{Y}$ which predicts the class membership of an element of $U$ in a decision class. The predicted and true values of class membership can be cross–classified and counted in a confusion matrix. If success of a classifier is measured by error rate, confusion matrices may be used to analyse and to compare classifiers. A widely used confusion matrix of dimension two is shown in Table 2, and a general confusion matrix is shown in Table 2. An entry $\langle\hat{Y}_{i},Y_{j}\rangle=n_{ij}$ in the matrix is the number of elements of $Y_{j}$ which are predicted to be in $Y_{i}$ ; in particular, $\sum\{n_{ii}:1\leq i\leq k\}$ is the number of correctly classified elements.

The philosophy of rough sets is based on the assumption that knowledge of the world depends on the granularity of representation [4]. Mathematically, granularity may be expressed by an equivalence relation $\theta$ on a nonempty finite set $U$ , up to the classes of which membership in a subset of $U$ can be determined. For rough approximation, two operators are defined on $2^{U}$ in the following way: Let $\mathcal{X}:=\{X_{1},\ldots,X_{m}\}$ be the set of equivalence classes of $\theta$ . If $Y\subseteq U$ , then,

[TABLE]

The main data type of the rough set approach are decision systems which are closely related to relational data tables with an added decision attribute. An example is shown in Table 4; there, the object set $U$ contains six elements, there are four independent attributes, and one decision attribute $d$ .

For simplicity of notation, we suppose that an attribute $a$ is a mapping from $U$ to the set $V_{a}$ of values of $a$ . Each set $Q$ of independent attributes gives rise to an equivalence relation $\theta_{Q}$ on $U$ by setting $x\theta_{Q}y$ if and only if $a(x)=a(y)$ for all $a\in Q$ . Similarly, the decision attribute $d$ induces an equivalence relation $\theta_{d}$ , the classes $\mathcal{Y}:=\{Y_{1},\ldots,Y_{k}\}$ of which are called decision classes. We cross–classify the classes of $\theta$ with the decision classes in a granule frequency matrix, see Table 4; there, $c_{j}=\lvert X_{j}\rvert$ , $n_{i}=\lvert Y_{i}\rvert$ , and $c_{ij}=\lvert X_{i}\cap Y_{j}\rvert$ . Furthermore, we introduce the following parameters for each decision class $Y_{i}$ :

[TABLE]

Consider the vector $\vec{X}_{i}=\langle c_{ij}:1\leq j\leq k\rangle$ belonging to granule $X_{i}$ . If $\vec{X}_{i}$ contains only one non–zero entry, we call the granule deterministic. In this case, $X_{i}\subseteq Y_{j}$ and prediction based on $X_{i}$ is perfect. Otherwise, the granule is called indeterministic. A subset $Y$ of $U$ is called definable, if it is a union of elements of $\mathcal{X}$ .

A major aim of rough set data analysis is to decide (or estimate) membership of an element $x$ of $U$ in a decision class using the knowledge given by a set $Q$ of attributes, in particular, how well the decision classes can be approximated by the knowledge obtained from a partition induced by $Q$ . Note that we can define a partial classifier $f_{r}$ as follows: If $D=\bigcup\{X_{i}:X_{i}\text{ is a deterministic class}\}$ , then each $x\in D$ is correctly classified (and these are the only ones). Thus we can set $f_{r}(x)=x$ for all $x\in D$ . If $x\in X_{i}$ and $x\not\in D$ , then the rough method assigns $x$ to one ore more upper approximations of decision classes. In this sense, rough approximation is not a point estimate. With some abuse of language, we call $f_{r}$ a rough classifier.

In the sequel, we suppose that $\mathcal{X}=\{X_{1},\ldots,X_{m}\}$ is the set of classes of a fixed equivalence relation $\theta$ on $U$ , called granules, and $\mathcal{Y}=\{Y_{1},\ldots,Y_{k}\}$ is a set of decision classes; to avoid trivialities we assume that $k>1$ . Lower and upper approximations are taken with respect to $\mathcal{X}$ , and we shall omit the indices in the approximation functions. We shall write $Z=Z_{1}\uplus\ldots\uplus Z_{r}$ if $Z=Z_{1}\cup\ldots\cup Z_{r}$ , and the sets $Z_{i}$ are pairwise disjoint. At times, we are only interested whether the entry in a cell is [math] or not. To this end, we introduce an indicator function $Ind:\mathbb{N}\to\{0,1\}$ defined by

[TABLE]

For the basic philosophy and tools of the rough set method the reader is invited to consult [5]. For recent developments and more advanced methods the overview [6] is an excellent source.

3 Rough confusion matrices

According to the rough set philosophy, we can only distinguish elements of $U$ up to equivalence with respect to $\theta$ , hence, we must have $f(x)=f(y)$ for any classifier $f$ whenever $x$ and $y$ are in the same granule. Thus, with some abuse of language, we call a function $f:\mathcal{X}\to\mathcal{Y}$ a (rough) classifier. The meaning of the classifier $f$ is that each element of $X_{i}$ is predicted to be in $f(X_{i})$ . Thus, we obtain the predictor sets

[TABLE]

If $\hat{Y}_{i}=\emptyset$ , then no element of $U$ is predicted to be in $Y_{i}$ by any class $X_{s}$ using $f$ . The (rough) confusion matrix of the classifier $f$ has dimension $k\times k$ , row labels $\hat{Y}_{i}$ , column labels $Y_{j}$ and, for $1\leq i,j\leq k$ , the entries

[TABLE]

Thus, $n_{ij}=\sum_{f(X_{s})=Y_{i}}\lvert X_{s}\cap Y_{j}\rvert$ . Since $\mathcal{X}$ is a partition of $U$ , $n_{ii}\leq\lvert Y_{i}\rvert$ for all $1\leq i\leq k$ .

The rough confusion matrix can be obtained in several steps:

Write the granule frequency matrix $\mathfrak{M}$ obtained from $\mathcal{X}$ and $\mathcal{Y}$ as in Table 4. 2. 2.

Relabel the rows of $\mathfrak{M}$ by $f(X_{i})$ by replacing $X_{i}$ with $f(X_{i})$ . 3. 3.

Aggregate the frequencies of the rows with the same label according to (3.2). If $f^{-1}(Y_{j})=\emptyset$ , fill the row labeled $\hat{Y}_{j}$ with [math]s. 4. 4.

Sort the rows according to indices of their labels. The result has the form shown in Table 2.

Example 1.

We shall use the decision system of Table 4. Let $\theta$ be the equivalence relation generated by the attributes Price and Screen. The partition generated by $\theta$ has the classes

[TABLE]

and the decision classes

[TABLE]

We define $f:\mathcal{X}\to\mathcal{Y}$ by $f(X_{1})=f(X_{4})=Y_{1}$ , and $f(X_{2})=f(X_{3})=Y_{2}$ . The construction process is shown in Tables 7, 7, and 7.

Note that $f$ classifies five of the six elements of $U$ correctly, so that its success ratio is $\frac{5}{6}$ , where as $\gamma=\frac{4}{6}$ . $\Box$

According to the rough set philosophy, the set $\operatorname{\mathit{Low}}(Y_{i})$ approximates the diagonal set $\hat{Y}_{i}\cap Y_{i}$ . The optimal approximation would be $\operatorname{\mathit{Low}}(Y_{i})=\hat{Y}_{i}\cap Y_{i}$ with $|\hat{Y}_{i}\cap Y_{i}|=n_{ii}$ ; in this case, $Y_{i}$ is deterministic with respect to $\mathcal{X}$ . Without knowledge of the source information system, but given the resulting confusion matrix, we obtain only $|\operatorname{\mathit{Low}}(Y_{i})|\leq n_{ii}$ . Similarly, it is easy to see that $|\operatorname{\mathit{Upp}}(Y_{i})|\geq n_{i.}+n_{.i}-n_{ii}$ .

Two statistics are of importance in the rough set literature: The rough approximation quality is the weighted sum

[TABLE]

and the accuracy of approximation of the decision class $Y_{i}$ is defined by the index

[TABLE]

Here, $p_{i}:=\frac{nl_{i}}{n_{i}}$ and $p^{i}:=\frac{n_{i}}{nu_{i}}$ are precision indices [7]. The measure $\alpha_{i}$ is the maximal (best possible) value for the approximation quality of the set $Y_{i}$ of an information system which produces the observed confusion matrix.

Note that $\gamma$ and the upper bound weighted mean value

[TABLE]

of the $\alpha_{i}$ are linked by a strictly monotone transformation, since

[TABLE]

Therefore, they are interchangeable as a measure of overall approximation quality.

The $\alpha$ – accuracy is connected to the confusion matrix (and not to the underlying information system) by $\alpha_{i}=\frac{nl_{i}}{nu_{i}}=\frac{n_{ii}}{n_{i\bullet}+n_{\bullet i}-n_{ii}}$ . As $\alpha$ is a weighted mean of the $\alpha_{i}$ and $\gamma$ is a strictly monotone function of $\alpha$ , we observe that upper confusion $\gamma$ and upper confusion $\alpha$ are maximal as well.

4 Refining the rough classifier

Thus far, we have put no restrictions on the classifier function $f$ . In order to bring the concept closer to rough sets, and use more of the available information, we shall suppose in the sequel that a rough classifier satisfies the condition

[TABLE]

(4.1) implies that at least one element of $X_{i}$ is classified correctly by $f$ . Furthermore,

Lemma 4.1.

If $X_{i}\subseteq Y_{j}$ , then $f(X_{i})=Y_{j}$ . 2. 2.

$\operatorname{\mathit{Low}}(Y_{j})\subseteq\hat{Y}_{j}$ . 3. 3.

If $n_{ii}=0$ , then $n_{ij}=0$ for all $1\leq j\leq k$ .

Our first task is to approximate $nl_{j}=\lvert\operatorname{\mathit{Low}}(Y_{j})\rvert$ . To this end, we first consider $n_{j}^{*}:=n_{jj}$ . The cell $n_{jj}$ counts, in particular, the cardinality of the deterministic granules contained in $Y_{j}$ , and thus, $nl_{j}\leq n_{j}^{*}$ . We can further remove certain entries, and define $nl_{j}^{**}:=n_{jj}-Ind\left(\sum_{j\neq i}n_{ji}\right)$ . Using Lemma 4.1 it is not hard, if somewhat tedious, to show the relationships among these indices:

Theorem 4.1.

Let $1\leq j\leq k$ . Then,

[TABLE]

Not all of these inequalities need to hold if $f$ does not satisfy (4.1).

Turning to upper approximations, we first observe that (4.1) is equivalent to $X_{i}\subseteq\operatorname{\mathit{Upp}}(f(X_{i}))$ by (2.2), and thus, $\hat{Y}_{j}$ is a lower bound of the rough upper approximation of $Y_{j}$ , i.e. $\lvert\hat{Y}_{j}\rvert\leq nu_{j}$ . This can be sharpened as follows: Set

[TABLE]

A moment’s reflection shows that $\sum_{i}n_{ji}$ adds all the cells in the partial granule frequency matrix spanned by the rows $X_{i}$ where $f(X_{i})=Y_{j}$ , and $\sum_{i\neq j}n_{ij}$ adds the entries $c_{ij}$ , where $X_{i}\cap Y_{j}\neq\emptyset$ and $f(X_{i})\neq Y_{j}$ .

If $n_{ij}\neq 0$ , then $n_{ii}\neq 0$ by Lemma 4.1, and therefore, there is some $X_{s}$ , such that $f(X_{s})=Y_{i}$ and $X_{s}\cap Y_{j}\neq\emptyset$ , i.e. $X_{s}\subseteq\operatorname{\mathit{Upp}}(Y_{j})$ . Therefore, if $n_{ij}\neq 0$ , there is at least one additional element which is in $\operatorname{\mathit{Upp}}(Y_{j})$ . Hence, we obtain a sharper bound by setting $nu_{j}^{**}:=nu^{*}_{j}+\sum_{i\neq j}Ind(n_{ij}))$ . Altogether, this leads to the following result:

Theorem 4.2.

Let $1\leq j\leq k$ . Then,

[TABLE]

Arguably, the simplest classifier that satisfies (4.1) is a maximal row classifier $f_{mrc}$ defined as follows: Consider a granule frequency matrix shown in Table 4. For each $1\leq i\leq m$ choose some $1\leq j\leq k$ such that $c_{ij}$ is maximal in $\{c_{i1},\ldots,c_{ik}\}$ . Such $j$ always exists, but the choice need not be unique. Then, set $f_{mrc}(X_{i}):=Y_{j}$ . The classifier $f_{mrc}$ satisfies (4.1), and it is well compatible with the rough set philosophy in using only information supplied by the data.

By definition, $X_{i}\subseteq\hat{Y}_{j}$ implies that $c_{ij}$ is a maximum in row $i$ . We can use this observation to establish an even sharper upper bound of $nl_{j}$ : Suppose that $\hat{Y}_{j}=\bigcup\{X_{s_{1}},\ldots,X_{s_{p}}\}$ , and consider the partial granule matrix

[TABLE]

Since a maximum of each row is in column $Y_{j}$ , it follows that $n_{jt}\leq n_{jj}$ for all $1\leq t\leq k$ , and therefore, $\max\{n_{jt}:1\leq t\leq k,t\neq j\}\leq n_{jj}$ . Setting $nl_{j}^{m}:=n_{jj}-\max\{n_{jt}:1\leq t\leq k,t\neq j\}\geq nl_{j}$ we obtain

Theorem 4.3.

$nl_{j}\leq nl_{j}^{m}\leq nl_{j}^{**}$ * for all $1\leq j\leq k$ .*

Finally, we estimate the rough upper bound of $Y_{j}$ using $f_{mrc}$ . Setting $nu_{j}^{m}:=n_{jj}+\sum_{j\neq i}(n_{ji}+2\cdot n_{ij})$ , it can be shown that

Theorem 4.4.

$nl_{j}^{**}\leq nu_{j}^{m}\leq nu_{j}$ * for all $1\leq j\leq k$ .*

5 Conclusion and outlook

In this note, we have explored a connection between rough set approximation and confusion matrices, and have presented several natural indices that approximate the lower and upper bounds given by the reference standard. Owing to lack of space, we have only indicated the procedures with respect to one observer.

The next step will be to broaden the investigation to two or more observers: Each of these has internal sets $\mathcal{X}$ and $\mathcal{X}^{\prime}$ of granules which need to be reconciliated to a common standard. This is related to inter–rater reliability which is a common technique used in psychology (and AI) to gauge agreement among experts. We shall also re–interpret common statistics of rough set analysis based on rough confusion matrices. This will, in some sense, complement our earlier research on precision indices in the rough set framework [8].

References

[1]

Novaković J, Veljović A, Ilić S, Papić Ž and Tomović M 2017 Theory and Applications of Mathematics & Computer Science 7 39 – 46

[2]

Hand D J 2005 Applied Stochastic Models in Business and Industry 21 97–109 ISSN 1526-4025

[3]

Caelen O 2017 Annals of Mathematics and Artificial Intelligence 81 429–450

[4]

Pawlak Z 1982 Internat. J. Comput. Inform. Sci. 11 341–356

[5]

Düntsch I and Gediga G 2000 Rough set data analysis: A road to non-invasive knowledge discovery (Bangor: Methodos Publishers (UK))

[6]

Nguyen H and Skowron A 2013 Rough Sets and Intelligent Systems - Professor Zdzisław Pawlak in Memoriam, Vol 1 ed Skowron A and Suraj Z (Springer Verlag) pp 75–173

[7]

Gediga G and Düntsch I 2001 Artificial Intelligence 132 219–234

[8]

Gediga G and Düntsch I 2014 Transactions on Rough Sets Vol. XVII (Lecture Notes in Computer Science vol 8375) ed Peters J and Skowron A (Heidelberg: Springer Verlag) pp 33 – 47

Bibliography8

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Novaković J, Veljović A, Ilić S, Papić Ž and Tomović M 2017 Theory and Applications of Mathematics & Computer Science 7 39 – 46
2[2] Hand D J 2005 Applied Stochastic Models in Business and Industry 21 97–109 ISSN 1526-4025
3[3] Caelen O 2017 Annals of Mathematics and Artificial Intelligence 81 429–450
4[4] Pawlak Z 1982 Internat. J. Comput. Inform. Sci. 11 341–356
5[5] Düntsch I and Gediga G 2000 Rough set data analysis: A road to non-invasive knowledge discovery (Bangor: Methodos Publishers (UK))
6[6] Nguyen H and Skowron A 2013 Rough Sets and Intelligent Systems - Professor Zdzisław Pawlak in Memoriam, Vol 1 ed Skowron A and Suraj Z (Springer Verlag) pp 75–173
7[7] Gediga G and Düntsch I 2001 Artificial Intelligence 132 219–234
8[8] Gediga G and Düntsch I 2014 Transactions on Rough Sets Vol. XVII ( Lecture Notes in Computer Science vol 8375) ed Peters J and Skowron A (Heidelberg: Springer Verlag) pp 33 – 47

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Confusion matrices and rough set data analysis

Abstract

1 Introduction

2 Definitions and notation

3 Rough confusion matrices

Example 1**.**

4 Refining the rough classifier

Lemma 4.1**.**

Theorem 4.1**.**

Theorem 4.2**.**

Theorem 4.3**.**

Theorem 4.4**.**

5 Conclusion and outlook

References

Example 1.

Lemma 4.1.

Theorem 4.1.

Theorem 4.2.

Theorem 4.3.

Theorem 4.4.