Kernels on fuzzy sets: an overview

Jorge Guevara; Roberto Hirata Jr; St\'ephane Canu

arXiv:1907.12991·cs.LG·July 31, 2019

Kernels on fuzzy sets: an overview

Jorge Guevara, Roberto Hirata Jr, St\'ephane Canu

PDF

Open Access

TL;DR

This paper provides an overview of kernels designed for fuzzy sets, defining various classes and discussing their applicability in machine learning and data science tasks involving uncertain data.

Contribution

It introduces and categorizes kernels on fuzzy sets, expanding the toolkit for similarity measures in uncertain data contexts.

Findings

01

Defined multiple classes of kernels on fuzzy sets

02

Explored applicability in machine learning tasks

03

Provided a theoretical framework for fuzzy set kernels

Abstract

This paper introduces the concept of kernels on fuzzy sets as a similarity measure for $[0, 1]$ -valued functions, a.k.a. \emph{membership functions of fuzzy sets}. We defined the following classes of kernels: the cross product, the intersection, the non-singleton and the distance-based kernels on fuzzy sets. Applicability of those kernels are on machine learning and data science tasks where uncertainty in data has an ontic or epistemistic interpretation.

Tables1

Table 1. Table 1: Different formulations for k ∩ subscript 𝑘 k_{\cap} induced by different T-norms operators

Kernel $k_{\cap}$	T-norm
$k_{\cap_ \min} (X, Y) = \sum_{A \in 𝒞_{𝒳, 𝒴}} \sum_{x \in A} \min (X (x), Y (x)) ρ (A)$	minimum
$k_{\cap_ pro} (X, Y) = \sum_{A \in 𝒞_{𝒳, 𝒴}} \sum_{x \in A} X (x) Y (x) ρ (A)$	product
$k_{\cap_ Łuk} (X, Y) = \sum_{A \in 𝒞_{𝒳, 𝒴}} \sum_{x \in A} \max (X (x) + Y (x) - 1, 0) ρ (A)$	Łukasiewicz
$k_{\cap_ Dra} (X, Y) = \sum_{A \in 𝒞_{𝒳, 𝒴}} \sum_{x \in A} Z (X (x), Y (x)) ρ (A)$	Drastic

Equations12

k_{\times}(X,Y)=\sum_{\begin{subarray}{c}x\in\operatornamewithlimits{supp}(X),\\ \ y\in\operatornamewithlimits{supp}(Y)\end{subarray}}k_{1}\otimes k_{2}\big{(}(x,X(x)),(y,Y(y))\big{)},

k_{\times}(X,Y)=\sum_{\begin{subarray}{c}x\in\operatornamewithlimits{supp}(X),\\ \ y\in\operatornamewithlimits{supp}(Y)\end{subarray}}k_{1}\otimes k_{2}\big{(}(x,X(x)),(y,Y(y))\big{)},

ρ (s u pp (X \cap Y)) = A \in A \sum ρ (A) 1_{s u pp (X)} (A) 1_{s u pp (Y)} (A) .

ρ (s u pp (X \cap Y)) = A \in A \sum ρ (A) 1_{s u pp (X)} (A) 1_{s u pp (Y)} (A) .

k_{\cap}(X,Y)=\sum_{A\in\mathcal{A}}\big{(}X\cap Y\big{)}(A)\rho(A){\bf 1}_{supp(X)}(A){\bf 1}_{supp(Y)}(A),

k_{\cap}(X,Y)=\sum_{A\in\mathcal{A}}\big{(}X\cap Y\big{)}(A)\rho(A){\bf 1}_{supp(X)}(A){\bf 1}_{supp(Y)}(A),

k_{\cap} (X, Y) = A \in C_{X, Y} \sum (x \in A \sum T (X (x), Y (x))) ρ (A),

k_{\cap} (X, Y) = A \in C_{X, Y} \sum (x \in A \sum T (X (x), Y (x))) ρ (A),

k_{n s k} (X, Y) = x \in Ω sup (T (X (x), Y (x))),

k_{n s k} (X, Y) = x \in Ω sup (T (X (x), Y (x))),

k_{n s k}^{γ} (X, Y) = d = 1 \prod D exp (- \frac{1}{2} \frac{( m _{d} - m _{d}^{'} ) ^{2}}{σ _{d}^{2} + ( σ _{d}^{'} ) ^{2}}),

k_{n s k}^{γ} (X, Y) = d = 1 \prod D exp (- \frac{1}{2} \frac{( m _{d} - m _{d}^{'} ) ^{2}}{σ _{d}^{2} + ( σ _{d}^{'} ) ^{2}}),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFuzzy Logic and Control Systems · Fuzzy Systems and Optimization · Rough Sets and Fuzzy Logic

Full text

Kernels on fuzzy sets: an overview

Jorge. Guevara

IBM Research

Sao Paulo, Brazil

[email protected]

&Roberto Hirata Jr.

University of Sao Paulo

Sao Paulo, Brazil

[email protected]

&Stéphane Canu

INSA de Rouen

Rouen, France

[email protected]

Abstract

This paper introduces the concept of kernels on fuzzy sets as a similarity measure for $[0,1]$ -valued functions, a.k.a. membership functions of fuzzy sets. We defined the following classes of kernels: the cross product, the intersection, the non-singleton and the distance-based kernels on fuzzy sets. Applicability of those kernels are on machine learning and data science tasks where uncertainty in data has an ontic or epistemistic interpretation.

1 Introduction

Kernels on fuzzy sets were introduced by Guevara Díaz (2015) as a mean to estimate a similarity measure between fuzzy sets with geometrical interpretation on Reproducing Kernel Hilbert Spaces. Fuzzy sets are relaxed version of sets in the sense that fuzzy sets have $L$ -valued characteristic functions, where $L$ is a complete lattice, instead of having $\{0,1\}$ -valued functions. For instance, if we use the unit interval for $L$ , it is possible to have a degree of membership for elements in that kind of sets. In that sense, a fuzzy set can be completely characterized by its membership function: $X:\Omega\to[0,1]$ , and the evaluation $X(x)$ for some $x\in\Omega$ can be understood as the degree of membership of $x$ to the fuzzy set with that membership function. Fuzzy sets were introduced by Lotfi A. Zadeh in 1965 Zadeh (1965) and that concept has been used since then in different areas of science.

The aim of this paper is introduce the concept of kernels on fuzzy sets to the machine learning community. As all the computations are done using (membership) functions and some tools from fuzzy theory, we believe that this new tool would be helpful for machine learning and data science practitioners in problems where data can be better modelled with that kind of structure.

2 Why kernels on fuzzy sets?

Fuzzy sets have been widely used to model uncertainty in observational data, using either ontic or epistemic interpretation. Ontic, in the sense that point-wise uncertainties can be modeled by entities, i.e, FS can model set-valued attributes. Epistemistic, in the sense that a FS is a model for incomplete information on single-valued attributes, i.e., a model for non-precise data. Using the ontic interpretation it is possible to think that fuzzy sets are elements with some underlying probabilistic law, and, hence it is possible to have concepts such as fuzzy-valued random variables Kwakernaak (1978). Some modelling examples of ontic FS are: a region within an gray-scale image, a frequency profile, fuzzy clusters, a convolutional kernel on deep learning, etc. From the epistemic point of view, FS can be used to model a region within images describing the no well-known location of an object, for example, a statement describing the (unknown) age of a person, a nested set of intervals containing some unknown deterministic value (Hüllermeier (2005); Dubois (2011)). In practical applications, membership functions can be constructed very easy using either arbitrary functions derived from expert’s knowledge for example, or using data-driven approaches (from histograms or quantile functions, for example) without assuming any probabilistic law for the data generation process. However, there is a lack of use of fuzzy modeling techniques as an alternative tool from the ML community as it was noted by Hüllermeier (2005). This research attempts to fill this gap in the kernel method area. In this research we use the idea of kernels in order to estimate a similarity measure between fuzzy sets. This not only allow to have a geometric interpretation in Reproducing Kernel Hilbert Spaces for those similarity measures but also to use all the machine learning techniques from kernel methods on tasks where data can be modelled by fuzzy sets.

3 Kernels

Let $\mathcal{F}(\Omega)$ be the set of all FS with a membership function $X:\Omega\to[0,1]$ As fuzzy sets are completely characterized by their membership functions, we will use the same capital letter for denoting either a fuzzy set or its membership function, i.e., $X(x)$ denotes the degree of membership of an element $x\in\Omega$ to a fuzzy set $X\in\mathcal{F}(\Omega)$ . A kernel on fuzzy sets is then a real-valued mapping defined on $\mathcal{F}(\Omega)\times\mathcal{F}(\Omega)$ . In what follows we present four classes of kernels on fuzzy sets.

The cross product kernel on fuzzy sets - Let $k_{1},k_{2}$ be two real-valued kernels defined on $\Omega\times\Omega$ and $[0,1]\times[0,1]$ respectively. The cross product kernel on fuzzy sets is a function $k_{\times}:\mathcal{F}(\Omega)\times\mathcal{F}(\Omega)\to\mathbb{R}$ defined by:

[TABLE]

where $X(x)$ and $Y(y)$ are the membership degrees for the elements $x,y\in\Omega$ to the fuzzy sets $X,Y$ , the support of a fuzzy set is denoted by $\operatornamewithlimits{supp}$ , i.e. the set: $\{x\in\Omega\mid X(x)>0\}$ and the tensorial product: $k_{1}\otimes k_{2}:(\Omega\times[0,1])\times(\Omega\times[0,1])\to\mathbb{R}$ , is defined by: $k_{1}\otimes k_{2}\big{(}x,X(x),y,Y(y)\big{)}=k_{1}(x,y)\;k_{2}(X(x),Y(y))$ . Straightforward examples of positive definite cross product kernels on fuzzy sets can be obtained using positive definite kernels for $k_{1}$ and $k_{2}$ , for example, if $k_{2}$ is always the linear kernel we have following kernels: $k_{\times}(X,Y)=\sum_{\begin{subarray}{c}x\in\operatornamewithlimits{supp}(X),\\ \ y\in\operatornamewithlimits{supp}(Y)\end{subarray}}xyX(x)Y(y)$ , which uses a linear kernel for $k_{1}$ . Also, if we set $k_{1}$ to be the RBF kernel we have: $k_{\times}(X,Y)=\sum_{\begin{subarray}{c}x\in\operatornamewithlimits{supp}(X),\\ \ y\in\operatornamewithlimits{supp}(Y)\end{subarray}}\exp(-\gamma\|x-y\|^{2})X(x)Y(y)$ . Another example is given by defining the finite measure space $(\Omega,\mathcal{A},\mu)$ and assuming that $k_{1},k_{2}$ are continuous kernels functions with finite integral, then, the kernel $k_{\times}(X,Y)=\iint_{\begin{subarray}{c}x\in\operatornamewithlimits{supp}(X),\\ \ y\in\operatornamewithlimits{supp}(Y)\end{subarray}}k_{1}\otimes k_{2}\big{(}(x,X(x)),(y,Y(y))\big{)}d\mu(x)d\mu(y)$ , is a cross product kernel on fuzzy sets. An instance of this kernel is given when we use a probability measure $\mathbb{P}$ instead of $\mu$ , the resulting kernel incorporates two kinds of uncertainty modelling: fuzziness and randomness. Fuzziness in the form of membership functions and randomness because, independently of the degree of membership of $x$ to the fuzzy set $X$ , the above formulation considers the values $x$ being outcomes of a random variable with probability distribution $\mathbb{P}$ .

The cross product kernel on fuzzy sets was presented by Guevara et al. (2017). That kernel always would be positive definite if $k_{1}$ and $k_{2}$ are positive definite. Kernel $k_{\times}$ is a natural extension of the kernel on sets to the fuzzy set domain. It can be shown that $k_{\times}$ is indeed a kind of convolution kernel (Haussler (1999)), and that under some assumptions it embeds probability distributions into RKHS. This kernel was successfully used in supervised classification on attribute noisy datasets, where it was shown that the kernel is resistant to injected random noise over the values of the predictors (see reference Guevara et al. (2017) for the experiments).

The intersection kernel on fuzzy sets, this kernel is based on the intersection operation between fuzzy sets. The main idea is to use the concept of finite decomposition of sets within a semi-ring of sets $\mathcal{S}$ , for our purposes we assume that the support of the fuzzy sets of interest is an element of a semi-ring of sets. In order to define the intersection kernel on fuzzy sets we previously need the concept of semi-ring of sets: a semi-ring of sets, $\mathcal{S}\subseteq\Omega$ , is a subset of the power set $\mathcal{P}(\Omega)$ , satisfying the following conditions: 1) $\phi\in\mathcal{S}$ , $\phi$ is the empty set, 2) $A,B\in\mathcal{S}\implies A\cap B\in\mathcal{S}$ , and 3) for all $A,A_{1}\in\mathcal{S}$ such that $A_{1}\subseteq A$ , there is a sequence of pairwise disjoint sets: $A_{2},A_{3},\dots A_{N}\in\mathcal{S}$ , such that: $A=\bigcup_{i=1}^{N}A_{i}$ , this last condition 3 is known as the *finite decomposition of * a set $A$ . Gartner (2008), shows that a kernel $k:\mathcal{S}\times\mathcal{S}\to\mathbb{R}$ defined by $k(A,A^{\prime})=\rho(A\cap A^{\prime})$ is positive definite, where $\rho:\mathcal{S}\to[0,\infty]$ is a measure defined on semi-ring of sets. We will use the same reasoning for defining a kernel based on the intersection of fuzzy sets. In that sense, we will denote by $\mathcal{F}_{\mathcal{S}}(\Omega)$ the set of fuzzy sets whose support is an element of a semi-ring $\mathcal{S}$ and we will use the indicator function ${\bf 1}_{supp(X)}A$ , that is one if $A\subseteq supp(X)$ and zero otherwise. Hence, a natural way to measure the support of a fuzzy set is by using the measure $\rho$ (defined before) as follows: let denote by $\mathcal{A}\subseteq\mathcal{S}$ a finite system of pairwise disjoint sets and $\mathcal{B}\subseteq\mathcal{A}$ , then the measure of the support of a fuzzy set $X$ is defined by: $\rho(supp(X))=\sum_{A\in\mathcal{B}\subseteq\mathcal{A}}\rho(A)=\sum_{A\in\mathcal{A}}\rho(A){\bf 1}_{supp(X)}(A)$ , where we used the fact that $supp(X)=\bigcup_{A\in\mathcal{B}\subseteq\mathcal{A}}A$ . All this analysis, allow us to have the following expression for measuring the support of the intersection of two fuzzy sets $X,Y\in\mathcal{F}_{\mathcal{S}}(\Omega)$ :

[TABLE]

The intersection kernel on fuzzy sets is then the function: $k_{\cap}:\mathcal{F}_{\mathcal{S}}(\Omega)\times\mathcal{F}_{\mathcal{S}}(\Omega)\to\mathbb{R}$ , satisfying:

[TABLE]

where $\big{(}X\cap Y\big{)}(A)$ is an abuse of notation to indicate $\sum_{x\in A}\big{(}X\cap Y\big{)}\;(x)$ , i.e., the total contribution of the membership degrees of elements belonging to $A$ , evaluated in the membership function of $X\cap Y$ . Intersection of fuzzy sets are implemented via T-norm operators which are mappings of the form $T:[0,1]^{2}\to[0,1]$ such that, for all $x,y,z\in[0,1]$ , satisfy:

commutativity: $T(x,y)=T(y,x)$ ;
associativity: $T(x,T(y,z))=T(T(x,y),z)$ ;
monotonicity: $y\leq z\Rightarrow T(x,y)\leq T(x,z)$ ; and
limit condition $T(x,1)=x$ . (see ref. Yu and Zhang (2008); Klement et al. (2000) for additionally definition and notations). Using a T-norm operator $T$ , we have the following T-norm based kernel $k_{\cap}$ :

[TABLE]

where, for ease of notation we use $\mathcal{C}_{X,Y}=\{A\in\mathcal{A}|{\bf 1}_{supp(X)}(A){\bf 1}_{supp(Y)}(A)=1\}$ . Table 1 shows several kernels $k_{\cap}(X,Y)$ derived from common T-norms.

This kernel was presented in Guevara et al. (2014), this kernel is positive definite if the T-norm $T$ is a positive definite function.

The non-singleton kernel on fuzzy sets, this kernel is a function $\mathcal{F}(\Omega)\times\mathcal{F}(\Omega)\to[0,1]$ defined by:

[TABLE]

where $T$ is an T-norm operator, and $sup$ is the supremum. That kernel is also a kernel based on the intersection of fuzzy sets, because T-norms are used to estimate the intersection between fuzzy sets. In this sense a more general definition for this kernel is given by: $k_{nsk}(X,Y)=\underset{x\in\Omega}{\sup}\;\;\;\big{(}X\cap Y\big{)}(x)$ . This kernel was derived from the analysis of the interaction between non-singleton fuzzy systems and its inputs in the context of fuzzy inference, see Guevara et al. (2013) for details of that analysis. Particularly, for two tuples of fuzzy sets: $X=(X_{1},\dots,X_{d},\dots,X_{D})$ and $Y=(Y_{1},\dots,X_{d},\dots,Y_{D})$ , with Gaussian membership functions, i.e. $[0.1]$ -valued functions taken the following form: $X_{d}(.)=\exp\left(-\frac{1}{2}\frac{(.-m_{d})^{2}}{\sigma_{d}^{2}}\right)$ , where, $m_{d}\in\mathbb{R}$ amd $\sigma_{d}\in\mathbb{R}^{+}$ are the function parameters. Then, we proved that

[TABLE]

is a positive definite kernel. More instances of this kernel can be found in Guevara et al. (2013). Another important results regarding those kernels are that such kernels are fuzzy equivalence relation w.r.t a T-norm operator (Corollary $6$ in Moser (2006a)), they are at least $T_{cos}$ -transitive (Moser (2006a)) and they can be interpreted as fuzzy logic formulas for fuzzy rules (Theorem $9$ in Moser (2006b)). This kernel was applied on supervised classification of data containing interval-valued predictors.

Distance-based kernels on fuzzy sets, this kernels are based on the concept of distance substitution kernels (Haasdonk and Bahlmann (2004)). The main ideia is to use metrics, pseudo-metrics or semi-metrics in order to define symmetric kernels. For a metric $D$ , and for $x,y\in\Omega$ Haasdonk and Bahlmann (2004) defined $\langle x,y\rangle_{D}^{x_{0}}=\frac{1}{2}\big{(}D(x,x_{0})^{2}+D(y,x_{0})^{2}-D(x,y)^{2}\big{)}$ , where $x_{0}$ is some arbitrary point in $\Omega$ . We use the same idea to define the operation $\langle X,Y\rangle_{D}^{X_{0}}$ in a similar way for $X,Y\in\mathcal{F}(\Omega)$ . Then, the following kernels are positive definite if $D$ is a metric between fuzzy sets:

$K(X,Y)=\langle X,Y\rangle_{D}^{X_{0}}$ , which can be viewed as a kind of inner product kernel, 2) $K(X,Y)=\big{(}\alpha+\gamma\langle X,Y\rangle_{D}^{X_{0}}\big{)}^{\beta}$ , where $\alpha,\gamma\in\mathbb{R}^{+}$ , $\beta\in\mathbb{N}$ , and can be viewed as a polynomial type kernel, and 3) $K(X,Y)=\exp(-\gamma D(X,Y)^{2})$ which is a kind of Gaussian kernel. For instance, using the following metric on fuzzy sets: $D(X,X^{\prime})=\dfrac{\sum_{x\in\Omega}|X(x)-X^{\prime}(x)|}{\sum_{x\in\Omega}|X(x)+X^{\prime}(x)|}$ and by inserting that metric into the kernel definition, i.e. $K_{D}(X,X^{\prime})=\exp(-\lambda D(X,X^{\prime})^{2})$ , we will have a positive definite kernel. Further, if $D$ is not a metric but instead is a semi-metric or pseudo-metric, still it is possible to perform machine learning on symmetric kernels (Bahlmann et al. (2002); Chapelle et al. (1999); Haasdonk and Keysers (2002); Moreno et al. (2003)). Some popular distances between fuzzy sets that could induce new kernels on fuzzy sets can be found in Bloch (1999); Rosenfeld (1985); Chaudhur and Rosenfeld (1996); Diamond et al. (1994). Distance-based kernels on fuzzy sets were applied on two-sampled hyphotesis testing on heterogeneous data (Guevara et al. (2015)).

4 Conclusions

In this paper we introduced the concept of kernels on fuzzy sets, we presented four classes of that kind of kernels: the cross product kernel on fuzzy sets that is an extension of the widely-known kernel on sets to the fuzzy set domain; the intersection kernel on fuzzy sets that uses some concepts from set and fuzzy set theory for its own definition; the non-singleton kernel on fuzzy sets that was basically derived from the analysis of non-singleton fuzzy systems; and the distance-based kernels on fuzzy sets that uses the concept of distance substitution kernels. We think that that class of kernels are usefully in contexts where data uncertainty has an ontic or epistemic interpretation. There are some successfully applications of those kernels in tasks like classification of attribute noisy data, classification of interval data and kernel hypothesis testing. However, we think that more experimental research must be done using those kernels in order to validate or extrapolate their applicability.

Acknowledgments

The authors are thankful with FAPESP grant # 2015/01587-0, CNPq, CAPES, NAP eScience - PRP - USP and IBM Research Brazil for their financial support.

Bibliography24

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Bahlmann et al. [2002] C. Bahlmann, B. Haasdonk, and H. Burkhardt. Online handwriting recognition with support vector machines - a kernel approach. In Frontiers in Handwriting Recognition, 2002. Proceedings. Eighth International Workshop on , pages 49–54, 2002. doi: 10.1109/IWFHR.2002.1030883 .
2Bloch [1999] Isabelle Bloch. On fuzzy distances and their use in image processing under imprecision. Pattern Recognition , 32(11):1873–1895, 1999.
3Chapelle et al. [1999] Olivier Chapelle, Patrick Haffner, and Vladimir N Vapnik. Support vector machines for histogram-based image classification. Neural Networks, IEEE Transactions on , 10(5):1055–1064, 1999.
4Chaudhur and Rosenfeld [1996] BB Chaudhur and Azriel Rosenfeld. On a metric distance between fuzzy sets. Pattern Recognition Letters , 17(11):1157–1160, 1996.
5Diamond et al. [1994] Phil Diamond, Peter E Kloeden, Peter Eris Kloeden, Australia Mathematician, and Peter Eris Kloeden. Metric spaces of fuzzy sets: theory and applications . World Scientific, 1994. ISBN 978-981-02-1731-0.
6Dubois [2011] Didier Dubois. Ontic vs. epistemic fuzzy sets in modeling and data processing tasks. In IJCCI (NCTA) , page 13, 2011.
7Gartner [2008] Thomas Gartner. Kernels for structured data , volume 72. World Scientific, 2008. ISBN 978-3-540-00567-4.
8Guevara et al. [2013] J. Guevara, R. Hirata, and S. Canu. Kernel functions in takagi-sugeno-kang fuzzy system with nonsingleton fuzzy input. In Fuzzy Systems (FUZZ), 2013 IEEE International Conference on , pages 1–8, 2013. doi: 10.1109/FUZZ-IEEE.2013.6622409 .