Towards Logical Specification of Statistical Machine Learning

Yusuke Kawamoto

arXiv:1907.10327·cs.LO·July 19, 2023

Towards Logical Specification of Statistical Machine Learning

Yusuke Kawamoto

PDF

TL;DR

This paper presents a novel logical framework for formalizing and analyzing statistical properties of machine learning classifiers, including performance, robustness, and fairness, using epistemic and counterfactual logic.

Contribution

It introduces a formal model based on Kripke structures and develops logical formulas to express and relate statistical properties of classifiers, including new formalizations of robustness and fairness.

Findings

01

Relationships among classifier properties are established.

02

Robustness-related properties are identified and formalized.

03

Counterfactual knowledge is used to formalize fairness.

Abstract

We introduce a logical approach to formalizing statistical properties of machine learning. Specifically, we propose a formal model for statistical classification based on a Kripke model, and formalize various notions of classification performance, robustness, and fairness of classifiers by using epistemic logic. Then we show some relationships among properties of classifiers and those between classification performance and robustness, which suggests robustness-related properties that have not been formalized in the literature as far as we know. To formalize fairness properties, we define a notion of counterfactual knowledge and show techniques to formalize conditional indistinguishability by using counterfactual epistemic operators. As far as we know, this is the first work that uses logical formulas to express statistical properties of machine learning, and that provides epistemic…

Tables1

Table 1. Table 1 : Logical description of the table of confusion

	Actual class
	positive	negative	${𝖯𝗋𝖾𝗏𝖺𝗅𝖾𝗇𝖼𝖾}_{ℓ, I} (x) \overset{def}{=}$	${𝖠𝖼𝖼𝗎𝗋𝖺𝖼𝗒}_{ℓ, I} (x) \overset{def}{=}$
	$h_{ℓ} (x)$	$\neg h_{ℓ} (x)$	$ℙ_{I} (h_{ℓ} (x))$	$ℙ_{I} (ψ_{ℓ} (x) \leftrightarrow h_{ℓ} (x))$
Positive
prediction	$𝑡𝑝 (x) \overset{def}{=}$	$𝑓𝑝 (x) \overset{def}{=}$	${𝖯𝗋𝖾𝖼𝗂𝗌𝗂𝗈𝗇}_{ℓ, I} (x) \overset{def}{=}$	${𝖥𝖣𝖱}_{ℓ, I} (x) \overset{def}{=}$
$ψ_{ℓ} (x)$	$ψ_{ℓ} (x) \land h_{ℓ} (x)$	$ψ_{ℓ} (x) \land \neg h_{ℓ} (x)$	$ψ_{ℓ} (x) \supset ℙ_{I} h_{ℓ} (x)$	$ψ_{ℓ} (x) \supset ℙ_{I} \neg h_{ℓ} (x)$
Negative
prediction	$𝑓𝑛 (x) \overset{def}{=}$	$𝑡𝑛 (x) \overset{def}{=}$	${𝖥𝖮𝖱}_{ℓ, I} (x) \overset{def}{=}$	${𝖭𝖯𝖵}_{ℓ, I} (x) \overset{def}{=}$
$\neg ψ_{ℓ} (x)$	$\neg ψ_{ℓ} (x) \land h_{ℓ} (x)$	$\neg ψ_{ℓ} (x) \land \neg h_{ℓ} (x)$	$\neg ψ_{ℓ} (x) \supset ℙ_{I} h_{ℓ} (x)$	$\neg ψ_{ℓ} (x) \supset ℙ_{I} \neg h_{ℓ} (x)$
	${𝖱𝖾𝖼𝖺𝗅𝗅}_{ℓ, I} (x) \overset{def}{=}$	${𝖥𝖺𝗅𝗅𝖮𝗎𝗍}_{ℓ, I} (x) \overset{def}{=}$
	$h_{ℓ} (x) \supset ℙ_{I} ψ_{ℓ} (x)$	$\neg h_{ℓ} (x) \supset ℙ_{I} ψ_{ℓ} (x)$
	${𝖬𝗂𝗌𝗌𝖱𝖺𝗍𝖾}_{ℓ, I} (x) \overset{def}{=}$	${𝖲𝗉𝖾𝖼𝗂𝖿𝗂𝖼𝗂𝗍𝗒}_{ℓ, I} (x) \overset{def}{=}$
	$h_{ℓ} (x) \supset ℙ_{I} \neg ψ_{ℓ} (x)$	$\neg h_{ℓ} (x) \supset ℙ_{I} \neg ψ_{ℓ} (x)$

Equations56

s ⊨ γ (x_{1}, x_{2}, \dots, x_{k})

s ⊨ γ (x_{1}, x_{2}, \dots, x_{k})

s ⊨ \neg ψ

s ⊨ ψ \land ψ^{'}

M, w ⊨ P_{I} ψ

M, w ⊨ P_{I} ψ

M, w ⊨ \neg φ

M, w ⊨ φ \land φ^{'}

M, w ⊨ ψ \supset φ

M, w ⊨ K_{a} φ

M ⊨ φ

M ⊨ φ

R_{ε} = \mbox d e f {(w, w^{'}) \in W \times W ∣ D (σ_{w} (y) ∥ σ_{w^{'}} (y)) \leq ε},

R_{ε} = \mbox d e f {(w, w^{'}) \in W \times W ∣ D (σ_{w} (y) ∥ σ_{w^{'}} (y)) \leq ε},

\overline{R_{ε}} = {(w, w^{'}) \in W \times W ∣ D (σ_{w} (y) ∥ σ_{w^{'}} (y)) > ε} .

\overline{R_{ε}} = {(w, w^{'}) \in W \times W ∣ D (σ_{w} (y) ∥ σ_{w^{'}} (y)) > ε} .

M, w ⊨ \overline{K_{ε}} φ

M, w ⊨ \overline{K_{ε}} φ

\displaystyle~{}\mbox{ iff }~{}\mbox{for every $w^{\prime}$ s.t. }\mathfrak{M},w^{\prime}\models\neg\varphi,\mbox{ we have }(w,w^{\prime})\in\mathcal{R}_{\!\varepsilon}{.}

M, w ⊨ \overline{P_{ε}} φ

M, w ⊨ \overline{P_{ε}} φ

s ⊨ ψ (x, \overset{y}{^})

s ⊨ ψ (x, \overset{y}{^})

s ⊨ h (x, y)

M, w_{real} ⊨ P_{0.2} ψ_{ℓ} (x) .

M, w_{real} ⊨ P_{0.2} ψ_{ℓ} (x) .

\displaystyle\Pr\!\left[~{}v\stackrel{{\scriptstyle\mathrm{\$}}}{{\leftarrow}}\sigma_{\mathit{w_{\sf real}}}(x)\,:\,H(v)=\ell~{}\Big{|}~{}C(v)=\ell~{}\right]\in I{,}

\displaystyle\Pr\!\left[~{}v\stackrel{{\scriptstyle\mathrm{\$}}}{{\leftarrow}}\sigma_{\mathit{w_{\sf real}}}(x)\,:\,H(v)=\ell~{}\Big{|}~{}C(v)=\ell~{}\right]\in I{,}

\displaystyle\Pr\!\left[~{}s\stackrel{{\scriptstyle\mathrm{\$}}}{{\leftarrow}}\mathit{w_{\sf real}}\,:\,s\models h_{\ell}(x)~{}\Big{|}~{}s\models\psi_{\ell}(x)~{}\right]\in I{.}

\displaystyle\Pr\!\left[~{}s\stackrel{{\scriptstyle\mathrm{\$}}}{{\leftarrow}}\mathit{w_{\sf real}}\,:\,s\models h_{\ell}(x)~{}\Big{|}~{}s\models\psi_{\ell}(x)~{}\right]\in I{.}

M, w_{real} ⊨ Precision_{ℓ, I} (x) \mbox w h er e Precision_{ℓ, I} (x) = \mbox d e f ψ_{ℓ} (x) \supset P_{I} h_{ℓ} (x) .

M, w_{real} ⊨ Precision_{ℓ, I} (x) \mbox w h er e Precision_{ℓ, I} (x) = \mbox d e f ψ_{ℓ} (x) \supset P_{I} h_{ℓ} (x) .

Recall_{ℓ, I} (x) = \mbox d e f h_{ℓ} (x) \supset P_{I} ψ_{ℓ} (x) .

Recall_{ℓ, I} (x) = \mbox d e f h_{ℓ} (x) \supset P_{I} ψ_{ℓ} (x) .

R_{ε}^{D} = \mbox d e f {(w, w^{'}) \in W \times W ∣ D (σ_{w} (x) ∥ σ_{w^{'}} (x)) \leq ε},

R_{ε}^{D} = \mbox d e f {(w, w^{'}) \in W \times W ∣ D (σ_{w} (x) ∥ σ_{w^{'}} (x)) \leq ε},

M, w_{real} ⊨ h_{panda} (x) \supset K_{ε}^{D} P_{0} ψ_{gibbon} (x),

M, w_{real} ⊨ h_{panda} (x) \supset K_{ε}^{D} P_{0} ψ_{gibbon} (x),

\displaystyle\mathsf{TargetRobust}_{{\sf panda},\delta}(x,{\sf gibbon})\stackrel{{\scriptstyle\mbox{\scriptsize def}}}{{=}}\mathop{\mathsf{K}_{\varepsilon}^{\!D}}\bigl{(}h_{\sf panda}(x)\supset\mathop{\mathbb{P}_{[0,\delta]}}\psi_{\sf gibbon}(x)\bigr{)}.

\displaystyle\mathsf{TargetRobust}_{{\sf panda},\delta}(x,{\sf gibbon})\stackrel{{\scriptstyle\mbox{\scriptsize def}}}{{=}}\mathop{\mathsf{K}_{\varepsilon}^{\!D}}\bigl{(}h_{\sf panda}(x)\supset\mathop{\mathbb{P}_{[0,\delta]}}\psi_{\sf gibbon}(x)\bigr{)}.

TotalRobust_{ℓ, I} (x)

TotalRobust_{ℓ, I} (x)

R_{ε}^{tv}

R_{ε}^{tv}

M, w_{d} ⊨ GrpFair (x, \overset{y}{^})

M, w_{d} ⊨ GrpFair (x, \overset{y}{^})

R_{ε}^{r, D}

R_{ε}^{r, D}

M, w_{d} ⊨ IndFair (x, \overset{y}{^})

M, w_{d} ⊨ IndFair (x, \overset{y}{^})

Pr [C (x) = \hat{ℓ} ∣ x \in G, H (x) = ℓ] = Pr [C (x) = \hat{ℓ} ∣ x \in D ∖ G, H (x) = ℓ] .

Pr [C (x) = \hat{ℓ} ∣ x \in G, H (x) = ℓ] = Pr [C (x) = \hat{ℓ} ∣ x \in D ∖ G, H (x) = ℓ] .

\forall i\in[0,1].~{}\bigl{(}\xi_{d}\wedge\eta_{G}(x)\supset\mathsf{Recall}_{\ell,i}(x)\bigr{)}\leftrightarrow\bigl{(}\xi_{d}\wedge\neg\eta_{G}(x)\supset\mathsf{Recall}_{\ell,i}(x)\bigr{)}{.}

\forall i\in[0,1].~{}\bigl{(}\xi_{d}\wedge\eta_{G}(x)\supset\mathsf{Recall}_{\ell,i}(x)\bigr{)}\leftrightarrow\bigl{(}\xi_{d}\wedge\neg\eta_{G}(x)\supset\mathsf{Recall}_{\ell,i}(x)\bigr{)}{.}

\mathsf{EqOpp}(x,\hat{y})\!\stackrel{{\scriptstyle\mbox{\scriptsize def}}}{{=}}\!\bigl{(}\eta_{G}(x)\wedge\psi(x,\hat{y})\wedge h_{\ell}(x)\bigr{)}\supset\neg\overline{\mathop{\mathsf{P}_{0}^{\sf tv}}}\bigl{(}\xi_{d}\wedge\mathop{\mathbb{Q}_{w_{d}}}(\neg\eta_{G}(x)\wedge\psi(x,\hat{y})\wedge h_{\ell}(x))\bigr{)}{.}

\mathsf{EqOpp}(x,\hat{y})\!\stackrel{{\scriptstyle\mbox{\scriptsize def}}}{{=}}\!\bigl{(}\eta_{G}(x)\wedge\psi(x,\hat{y})\wedge h_{\ell}(x)\bigr{)}\supset\neg\overline{\mathop{\mathsf{P}_{0}^{\sf tv}}}\bigl{(}\xi_{d}\wedge\mathop{\mathbb{Q}_{w_{d}}}(\neg\eta_{G}(x)\wedge\psi(x,\hat{y})\wedge h_{\ell}(x))\bigr{)}{.}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

11institutetext: AIST, Tsukuba, Japan

Towards Logical Specification of Statistical Machine Learning

††thanks: This work was supported by JSPS KAKENHI Grant Number JP17K12667, by the New Energy and Industrial Technology Development Organization (NEDO), and by Inria under the project LOGIS.

Yusuke Kawamoto 11 0000-0002-2151-9560

Abstract

We introduce a logical approach to formalizing statistical properties of machine learning. Specifically, we propose a formal model for statistical classification based on a Kripke model, and formalize various notions of classification performance, robustness, and fairness of classifiers by using epistemic logic. Then we show some relationships among properties of classifiers and those between classification performance and robustness, which suggests robustness-related properties that have not been formalized in the literature as far as we know. To formalize fairness properties, we define a notion of counterfactual knowledge and show techniques to formalize conditional indistinguishability by using counterfactual epistemic operators. As far as we know, this is the first work that uses logical formulas to express statistical properties of machine learning, and that provides epistemic (resp. counterfactually epistemic) views on robustness (resp. fairness) of classifiers.

Keywords:

Epistemic logic Possible world semantics Divergence Machine learning Statistical classification Robustness Fairness

1 Introduction

With the increasing use of machine learning in real-life applications, the safety and security of learning-based systems have been of great interest. In particular, many recent studies [36, 8] have found vulnerabilities on the robustness of deep neural networks (DNNs) to malicious inputs, which can lead to disasters in security critical systems, such as self-driving cars. To find out these vulnerabilities in advance, there have been researches on the formal verification and testing methods for the robustness of DNNs in recent years [22, 25, 33, 37]. However, relatively little attention has been paid to the formal specification of machine learning [34].

To describe the formal specification of security properties, logical approaches have been shown useful to classify desired properties and to develop theories to compare those properties. For example, security policies in temporal systems have been formalized as trace properties [1] or hyperproperties [9], which characterize the relationships among various security policies. For another example, epistemic logic [39] has been widely used as formal policy languages (e.g., for the authentication [5] and the anonymity [35, 20] of security protocols, and for the privacy of social network [32]). As far as we know, however, no prior work has employed logical formulas to rigorously describe various statistical properties of machine learning, although there are some papers that (often informally) list various desirable properties of machine learning [34].

In this paper, we present a first logical formalization of statistical properties of machine learning. To describe the statistical properties in a simple and abstract way, we employ statistical epistemic logic (StatEL) [26], which is recently proposed to describe statistical knowledge and is applied to formalize statistical hypothesis testing and statistical privacy of databases.

A key idea in our modeling of statistical machine learning is that we formalize logical properties in the syntax level by using logical formulas, and statistical distances in the semantics level by using accessibility relations of a Kripke model [28]. In this model, we formalize statistical classifiers and some of their desirable properties: classification performance, robustness, and fairness. More specifically, classification performance and robustness are described as the differences between the classifier’s recognition and the correct label (e.g., given by the human), whereas fairness is formalized as the conditional indistinguishability between two groups or individuals by using a notion of counterfactual knowledge.

Our contributions.

The main contributions of this work are as follows:

•

We show a logical approach to formalizing statistical properties of machine learning in a simple and abstract way. In particular, we model logical properties in the syntax level, and statistical distances in the semantics level.

•

We introduce a formal model for statistical classification. More specifically, we show how probabilistic behaviors of classifiers and non-deterministic adversarial inputs are formalized in a distributional Kripke model [26].

•

We formalize the classification performance, robustness, and fairness of classifiers by using statistical epistemic logic (StatEL). As far as we know, this is the first work that uses logical formulas to formalize various statistical properties of machine learning, and that provides epistemic (resp. counterfactually epistemic) views on robustness (resp. fairness) of classifiers.

•

We show some relationships among properties of classifiers, e.g., different strengths of robustness. We also present some relationships between classification performance and robustness, which suggest robustness-related properties that have not been formalized in the literature as far as we know.

•

To formalize fairness properties, we define a notion of certain counterfactual knowledge and show techniques to formalize conditional indistinguishability by using counterfactual epistemic operators in StatEL. This enables us to express various fairness properties in a similar style of logical formulas.

Cautions and limitations.

In this paper, we focus on formalizing properties of classification problems and do not deal with the properties of learning algorithms (e.g., fairness through unawareness of sensitive attributes in data preparation), quality of training data (e.g., sample bias), quality of testing (e.g., coverage criteria), explainability, temporal properties, system level specification, or process agility in system development. It should be noted that most of the properties formalized in this paper have been known in literatures on machine learning, and the novelty of this work lies in the logical formulation of those statistical properties.

We also remark that this work does not provide methods for checking, guaranteeing, or improving the performance/robustness/fairness of machine learning. As for the satisfiability of logical formulas, we leave the development of testing and (statistical) model checking algorithms as future work, since the research area on the testing and formal/statistical verification of machine learning is relatively new and needs further techniques to improve the scalability. Moreover, in some applications such as image recognition, some formulas (e.g., representing whether an input image is panda or not) cannot be implemented mathematically, and require additional techniques based on experiments. Nevertheless, we demonstrate that describing various properties using logical formulas is useful to explore desirable properties and to discuss their relationships in a framework.

Finally, we emphasize that our work is the first attempt to use logical formulas to express statistical properties of machine learning, and would be a starting point to develop theories of specification of machine learning in future research.

Paper organization.

The rest of this paper is organized as follows. Section 2 presents background on statistical epistemic logic (StatEL) and notations used in this paper. Section 3 defines counterfactual epistemic operators and shows techniques to model conditional indistinguishability using StatEL. Section 4 introduces a formal model for describing the behaviours of statistical classifiers and non-deterministic adversarial inputs. Sections 5, 6, and 7 respectively formalize the classification performance, robustness, and fairness of classifiers by using StatEL. Section 8 presents related work and Section 9 concludes.

2 Preliminaries

In this section we introduce some notations and recall the syntax and semantics of the statistical epistemic logic (StatEL) introduced in [26].

2.1 Notations

Let $\mathbb{R}^{\geq 0}$ be the set of non-negative real numbers, and $[0,1]$ be the set of non-negative real numbers not greater than $1$ . We denote by $\mathbb{D}\mathcal{O}$ the set of all probability distributions over a set $\mathcal{O}$ . Given a finite set $\mathcal{O}$ and a probability distribution $\mu\in\mathbb{D}\mathcal{O}$ , the probability of sampling a value $y$ from $\mu$ is denoted by $\mu[y]$ . For a subset $R\subseteq\mathcal{O}$ we define $\mu[R]$ by: $\mu[R]=\sum_{y\in R}\mu[y]$ . For a distribution $\mu$ over a finite set $\mathcal{O}$ , its support is defined by ${\mathtt{supp}}(\mu)=\{v\in\mathcal{O}\colon\mu[v]>0\}$ .

The total variation distance of two distributions $\mu,\mu^{\prime}\in\mathbb{D}\mathcal{O}$ is defined by: $\mathit{D}_{\sf tv}(\mu\parallel\mu^{\prime})\stackrel{{\scriptstyle\mbox{\scriptsize def}}}{{=}}\sup_{R\subseteq\mathcal{O}}|\mu(R)-\mu^{\prime}(R)|$ .

2.2 Syntax of StatEL

We recall the syntax of the statistical epistemic logic (StatEL) [26], which has two levels of formulas: static and epistemic formulas. Intuitively, a static formula describes a proposition satisfied at a deterministic state, while an epistemic formula describes a proposition satisfied at a probability distribution of states. In this paper, the former is used only to define the latter.

Formally, let $\mathtt{Mes}$ be a set of symbols called measurement variables, and $\Gamma$ be a set of atomic formulas of the form $\gamma(x_{1},x_{2},\ldots,x_{n})$ for a predicate symbol $\gamma$ , $n\geq 0$ , and $x_{1},x_{2},\ldots,x_{n}\in\mathtt{Mes}$ . Let $I\subseteq[0,1]$ be a finite union of disjoint intervals, and $\mathcal{A}$ be a finite set of indices (e.g., associated with statistical divergences). Then the formulas are defined by:

Static formulas: $\psi\mathbin{::=}\gamma(x_{1},x_{2},\ldots,x_{n})\mid\neg\psi\mid\psi\wedge\psi$
Epistemic formulas: $\varphi\mathbin{::=}\mathop{\mathbb{P}_{I}}\psi\mid\neg\varphi\mid\varphi\wedge\varphi\mid\psi\supset\varphi\mid\mathop{\mathsf{K}_{a}}\varphi$

where $a\in\mathcal{A}$ . We denote by $\mathcal{F}$ the set of all epistemic formulas. Note that we have no quantifiers over measurement variables. (See Section 2.4 for more details.)

The probability quantification $\mathop{\mathbb{P}_{I}}\psi$ represents that a static formula $\psi$ is satisfied with a probability belonging to a set $I$ . For instance, $\mathop{\mathbb{P}_{(0.95,1]}}\psi$ represents that $\psi$ holds with a probability greater than $0.95$ . By $\psi\supset\mathop{\mathbb{P}_{I}}\psi^{\prime}$ we represent that the conditional probability of $\psi^{\prime}$ given $\psi$ is included in a set $I$ . The epistemic knowledge $\mathop{\mathsf{K}_{a}}\varphi$ expresses that we knows $\varphi$ with a confidence specified by $a$ .

As syntax sugar, we use disjunction $\vee$ , classical implication $\rightarrow$ , and epistemic possibility $\mathop{\mathsf{P}_{\!a}}$ , defined as usual by: $\varphi_{0}\vee\varphi_{1}\mathbin{::=}\neg(\neg\varphi_{0}\wedge\neg\varphi_{1})$ , $\varphi_{0}\rightarrow\varphi_{1}\mathbin{::=}\neg\varphi_{0}\vee\varphi_{1}$ , and $\mathop{\mathsf{P}_{\!a}}{\varphi}\mathbin{::=}\neg\mathop{\mathsf{K}_{a}}\neg\varphi$ . When $I$ is a singleton $\{i\}$ , we abbreviate $\mathop{\mathbb{P}_{I}}$ as $\mathop{\mathbb{P}_{i}}$ .

2.3 Distributional Kripke Model

Next we recall the notion of a distributional Kripke model [26], where each possible world is a probability distribution over a set $\mathcal{S}$ of states and each world $w$ is associated with a stochastic assignment $\sigma_{w}$ to measurement variables.

Definition 1 (Distributional Kripke model)

Let $\mathcal{A}$ be a finite set of indices (typically associated with statistical tests and their thresholds), $\mathcal{S}$ be a finite set of states, and $\mathcal{O}$ be a finite set of data. A distributional Kripke model is a tuple $\mathfrak{M}=(\mathcal{W},(\mathcal{R}_{a})_{a\in\mathcal{A}},(V_{s})_{s\in\mathcal{S}})$ consisting of:

•

a non-empty set $\mathcal{W}$ of probability distributions over a finite set $\mathcal{S}$ of states;

•

for each $a\in\mathcal{A}$ , an accessibility relation $\mathcal{R}_{a}\subseteq\mathcal{W}\times\mathcal{W}$ ;

•

for each $s\in\mathcal{S}$ , a valuation $V_{s}$ that maps each $k$ -ary predicate $\gamma$ to a set $V_{s}(\gamma)\subseteq\mathcal{O}^{k}$ .

The set $\mathcal{W}$ is called a universe, and its elements are called possible worlds. All measurement variables range over the same set $\mathcal{O}$ in every world.

We assume that each $w\in\mathcal{W}$ is associated with a function $\rho_{w}:\mathtt{Mes}\times\mathcal{S}\rightarrow\mathcal{O}$ that maps each measurement variable $x$ to its value $\rho_{w}(x,s)$ observed at a state $s$ . We also assume that each state $s$ in a world $w$ is associated with the assignment $\sigma_{s}:\mathtt{Mes}\rightarrow\mathcal{O}$ defined by $\sigma_{s}(x)=\rho_{w}(x,s)$ .

Since each world $w$ is a distribution of states, we denote by $w[s]$ the probability that a state $s$ is sampled from $w$ . Then the probability that a measurement variable $x$ has a value $v$ is given by $\sigma_{w}(x)[v]=\sum_{\begin{subarray}{c}s\in{\mathtt{supp}}(w),\sigma_{s}(x)=v\end{subarray}}w[s]$ . This implies that, when a state $s$ is drawn from $w$ , an input $\sigma_{s}(x)$ is sampled from the distribution $\sigma_{w}(x)$ .

2.4 Stochastic Semantics of StatEL

Now we recall the stochastic semantics [26] for the StatEL formulas over a distributional Kripke model $\mathfrak{M}=(\mathcal{W},(\mathcal{R}_{a})_{a\in\mathcal{A}},(V_{s})_{s\in\mathcal{S}})$ with $\mathcal{W}=\mathbb{D}\mathcal{S}$ .

The interpretation of static formulas $\psi$ at a state $s$ is given by:

[TABLE]

The restriction $w|_{\psi}$ of a world $w$ to a static formula $\psi$ is defined by $w|_{\psi}[s]=\frac{w[s]}{\sum_{s^{\prime}:s^{\prime}\models\psi}w[s^{\prime}]}$ if $s\models\psi$ , and $w|_{\psi}[s]=0$ otherwise. Note that $w|_{\psi}$ is undefined if there is no state $s$ that satisfies $\psi$ and has a non-zero probability in $w$ .

Then the interpretation of epistemic formulas in a world $w$ is defined by:

[TABLE]

where $s\stackrel{{\scriptstyle\mathrm{\$ }}}{{\leftarrow}}w $represents that a state$ s $is sampled from the distribution$ w$.

Then $\mathfrak{M},w\models\psi_{0}\supset\mathop{\mathbb{P}_{I}}\psi_{1}$ represents that the conditional probability of satisfying a static formula $\psi_{1}$ given another $\psi_{0}$ is included in a set $I$ at a world $w$ .

In each world $w$ , measurement variables can be interpreted using $\sigma_{w}$ . This allows us to assign different values to different occurrences of a variable in a formula; E.g., in $\varphi(x)\rightarrow\mathop{\mathsf{K}_{a}}\varphi^{\prime}(x)$ , $x$ occurring in $\varphi(x)$ is interpreted by $\sigma_{w}$ in a world $w$ , while $x$ in $\varphi^{\prime}(x)$ is interpreted by $\sigma_{w^{\prime}}$ in another $w^{\prime}$ s.t. $(w,w^{\prime})\in\mathcal{R}_{a}$ .

Finally, the interpretation of an epistemic formula $\varphi$ in $\mathfrak{M}$ is given by:

[TABLE]

3 Techniques for Conditional Indistinguishability

In this section we introduce some modal operators to define a notion of “counterfactual knowledge” using StatEL, and show how to employ them to formalize conditional indistinguishability properties. The techniques presented here are used to formalize some fairness properties of machine learning in Section 7.

3.1 Counterfactual Epistemic Operators

Let us consider an accessibility relation $\mathcal{R}_{\!\varepsilon}$ based on a statistical divergence $\mathit{D}(\cdot\parallel\cdot):\mathbb{D}\mathcal{O}\times\mathbb{D}\mathcal{O}\rightarrow\mathbb{R}^{\geq 0}$ and a threshold $\varepsilon\in\mathbb{R}^{\geq 0}$ defined by:

[TABLE]

where $y$ is the measurement variable observable in each world in $\mathcal{W}$ . Intuitively, $(w,w^{\prime})\in\mathcal{R}_{\!\varepsilon}$ represents that the probability distribution $\sigma_{w}(y)$ of the data $y$ observed in a world $w$ is indistinguishable from that in another world $w^{\prime}$ in terms of $D$ .

Now we define the complement relation of $\mathcal{R}_{\!\varepsilon}$ by $\overline{\mathcal{R}_{\!\varepsilon}}\stackrel{{\scriptstyle\mbox{\scriptsize def}}}{{=}}(\mathcal{W}\times\mathcal{W})\setminus\mathcal{R}_{\!\varepsilon}$ , namely,

[TABLE]

Then $(w,w^{\prime})\in\overline{\mathcal{R}_{\!\varepsilon}}$ represents that the distribution $\sigma_{w}(y)$ observed in $w$ can be distinguished from that in $w^{\prime}$ . Then the corresponding epistemic operator $\overline{\mathop{\mathsf{K}_{\varepsilon}}}$ , which we call a counterfactual epistemic operator, is interpreted as:

[TABLE]

Intuitively, (1) represents that if we were located in a possible world $w^{\prime}$ that looked distinguished from the real world $w$ , then $\varphi$ would always hold. This means a counterfactual knowledge111Our definition of counterfactual knowledge is limited to the condition of having an observation different from the actual one. More general notions of counterfactual knowledge can be found in previous work (e.g., [38]). in the sense that, if we had an observation different from the real world, then we would know $\varphi$ . This is logically equivalent to (2), representing that all possible worlds $w^{\prime}$ that do not satisfy $\varphi$ look indistinguishable from the real world $w$ in terms of $D$ .

We remark that the dual operator $\overline{\mathop{\mathsf{P}_{\!\varepsilon}}}$ is interpreted as:

[TABLE]

This means a counterfactual possibility in the sense that it might be the case where we had an observation different from the real world and thought $\varphi$ possible.

3.2 Conditional Indistinguishability via Counterfactual Knowledge

As shown in Section 7, some fairness notions in machine learning are based on conditional indistinguishability of the form (2), hence can be expressed using counterfactual epistemic operators.

Specifically, we use the following proposition, stating that given that two static formulas $\psi$ and $\psi^{\prime}$ are respectively satisfied in worlds $w$ and $w^{\prime}$ with probability $1$ , then the indistinguishability between $w$ and $w^{\prime}$ can be expressed as $w\models\psi\supset\neg\overline{\mathop{\mathsf{P}_{\!a}}}\mathop{\mathbb{P}_{1}}\psi^{\prime}$ . Note that this formula means that there is no possible world where we have an observation different from the real world $w$ (satisfying $\psi$ ) but we think $\psi^{\prime}$ possible; i.e., the formula means that if $\psi^{\prime}$ is satisfied then we have an observation indistinguishable from that in the real world $w$ .

Proposition 1 (Conditional indistinguishability)

Let $\mathfrak{M}=(\mathcal{W},(\mathcal{R}_{a})_{a\in\mathcal{A}},\allowbreak(V_{s})_{s\in\mathcal{S}})$ be a distributional Kripke model with the universe $\mathcal{W}=\mathbb{D}\mathcal{S}$ . Let $\psi$ and $\psi^{\prime}$ be static formulas, and $a\in\mathcal{A}$ .

(i)

$\mathfrak{M}\models\psi\supset\neg\overline{\mathop{\mathsf{P}_{\!a}}}\mathop{\mathbb{P}_{1}}\psi^{\prime}$ * iff for any $w,w^{\prime}\in\mathcal{W}$ , $\mathfrak{M},w\models\mathop{\mathbb{P}_{1}}\psi$ and $\mathfrak{M},w^{\prime}\models\mathop{\mathbb{P}_{1}}\psi^{\prime}$ imply $(w,w^{\prime})\in\mathcal{R}_{a}$ .* 2. (ii)

If $\mathcal{R}_{a}$ is symmetric, then $\mathfrak{M}\models\psi\supset\neg\overline{\mathop{\mathsf{P}_{\!a}}}\mathop{\mathbb{P}_{1}}\psi^{\prime}$ iff $\mathfrak{M}\models\psi^{\prime}\supset\neg\overline{\mathop{\mathsf{P}_{\!a}}}\mathop{\mathbb{P}_{1}}\psi$ .

See Appendix 0.A for the proof.

4 Formal Model for Statistical Classification

In this section we introduce a formal model for statistical classification by using distributional Kripke models (Definition 1). In particular, we formalize a probabilistic behaviour of a classifier $C$ and a non-deterministic input $x$ from an adversary in a distributional Kripke model.

4.1 Statistical Classification Problems

Multiclass classification is the problem of classifying a given input into one of multiple classes. Let $\mathtt{L}$ be a finite set of class labels, and $\mathcal{D}$ be a finite set of input data (called feature vectors) that we want to classify. Then a classifier is a function $C:\mathcal{D}\rightarrow\mathtt{L}$ that receives an input datum and predicts which class (among $\mathtt{L}$ ) the input belongs to. Here we do not model how classifiers are constructed from a set of training data, but deal with a situation where some classifier $C$ has already been obtained and its properties should be evaluated.

Let $f:\mathcal{D}\times\mathtt{L}\rightarrow\mathbb{R}$ be a scoring function that gives a score $f(v,\ell)$ of predicting the class of an input datum (feature vector) $v$ as a label $\ell$ . Then for each input $v\in\mathcal{D}$ , we denote by $H(v)=\ell$ to represent that a label $\ell$ maximizes $f(v,\ell)$ . For example, when the input $v$ is an image of an animal and $\ell$ is the animal’s name, $H(v)=\ell$ may represent that an oracle (or “human”) classifies the image $v$ as $\ell$ .

4.2 Modeling the Behaviours of Classifiers

Classifiers are formalized on a distributional Kripke model $\mathfrak{M}=(\mathcal{W},(\mathcal{R}_{a})_{a\in\mathcal{A}},\allowbreak(V_{s})_{s\in\mathcal{S}})$ with $\mathcal{W}=\mathbb{D}\mathcal{S}$ and a real world $\mathit{w_{\sf real}}\in\mathcal{W}$ . Recall that each world $w\in\mathcal{W}$ is a probability distribution over the set $\mathcal{S}$ of states and has a stochastic assignment $\sigma_{w}:\mathtt{Mes}\rightarrow\mathbb{D}\mathcal{O}$ that is consistent with the deterministic assignments $\sigma_{s}$ for all $s\in\mathcal{S}$ (as explained in Section 2.3).

We present an overview of our formalization in Fig. 1. We denote by $x\in\mathtt{Mes}$ an input datum given to the classifier $C$ (and to the oracle $H$ ), by $y\in\mathtt{Mes}$ a correct label given by the oracle $H$ , and by $\hat{y}\in\mathtt{Mes}$ a label predicted by $C$ . We assume that the input variable $x$ (resp. the output variables $y,\hat{y}$ ) ranges over the set $\mathcal{D}$ of input data (resp. the set $\mathtt{L}$ of labels); i.e., the deterministic assignment $\sigma_{s}$ at each state $s\in\mathcal{S}$ has the range $\mathcal{O}=\mathcal{D}\cup\mathtt{L}$ and satisfies $\sigma_{s}(x)\in\mathcal{D}$ and $\sigma_{s}(y),\sigma_{s}(\hat{y})\in\mathtt{L}$ .

A key idea in our modeling is that we formalize logical properties in the syntax level by using logical formulas, and statistical distances in the semantics level by using accessibility relations $\mathcal{R}_{a}$ . In this way, we can formalize various statistical properties of classifiers in a simple and abstract way.

To formalize a classifier $C$ , we introduce a static formula $\psi(x,\hat{y})$ to represent that $C$ classifies a given input $x$ as a class $\hat{y}$ . We also introduce a static formula $h(x,y)$ to represent that $y$ is the actual class of an input $x$ . As an abbreviation, we write $\psi_{\ell}(x)$ (resp. $h_{\ell}(x)$ ) to denote $\psi(x,\ell)$ (resp. $h(x,\ell)$ ). Formally, these static formulas are interpreted at each state $s\in\mathcal{S}$ as follows:

[TABLE]

4.3 Modeling the Non-deterministic Inputs from Adversaries

As explained in Section 2.3, when a state $s$ is drawn from a distribution $w\in\mathcal{W}$ , an input value $\sigma_{s}(x)$ is sampled from the distribution $\sigma_{w}(x)$ , and assigned to the measurement variable $x$ . Since $x$ denotes the input to the classifier $C$ , the input distribution $\sigma_{w}(x)$ over $\mathcal{D}$ can be regarded as the test dataset. This means that each world $w$ corresponds to a test dataset $\sigma_{w}(x)$ . For instance, $\sigma_{\mathit{w_{\sf real}}}(x)$ in the real world $\mathit{w_{\sf real}}$ represents the actual test dataset. The set of all possible test datasets (i.e., possible distributions of inputs to $C$ ) is represented by $\Lambda\stackrel{{\scriptstyle\mbox{\scriptsize def}}}{{=}}\left\{\sigma_{w}(x)\mid w\in\mathcal{W}\right\}$ . Note that $\Lambda$ can be an infinite set.

For example, let us consider testing the classifier $C$ with the actual test dataset $\sigma_{\mathit{w_{\sf real}}}(x)$ . When $C$ assigns a label $\ell$ to an input $x$ with probability $0.2$ , i.e., $\Pr\!\left[~{}v\stackrel{{\scriptstyle\mathrm{\$ }}}{{\leftarrow}}\sigma_{\mathit{w_{\sf real}}}(x),:,C(v)=\ell~{}\right]=0.2$, then this can be expressed by:

[TABLE]

We can also formalize a non-deterministic input $x$ from an adversary in this model as follows. Although each state $s$ in a possible world $w$ is assigned the probability $w[s]$ , each possible world $w$ itself is not assigned a probability. Thus, each input distribution $\sigma_{w}(x)\in\Lambda$ itself is also not assigned a probability, hence our model assumes no probability distribution over $\Lambda$ . In other words, we assume that a world $w$ and thus an adversary’s input distribution $\sigma_{w}(x)$ are non-deterministically chosen. This is useful to model an adversary’s malicious inputs in the definitions of security properties, because we usually do not have a prior knowledge of the distribution of malicious inputs from adversaries, and need to reason about the worst cases caused by the attack. In Section 6, this formalization of non-deterministic inputs is used to express the robustness of classifiers.

Finally, it should be noted that we cannot enumerate all possible adversarial inputs, hence cannot construct $\mathcal{W}$ by collecting their corresponding worlds. Since $\mathcal{W}$ can be an infinite set and is unspecified, we do not aim at checking whether or not a formula is satisfied in all possible worlds of $\mathcal{W}$ . Nevertheless, as shown in later sections, describing various properties using StatEL is useful to explore desirable properties and to discuss relationships among them.

5 Formalizing the Classification Performance

In this section we show a formalization of classification performance using StatEL (See Fig. 2 for basic ideas). In classification problems, the terms positive/negative represent the result of the classifier’s prediction, and the terms true/false represent whether the classifier predicts correctly or not. Then the following terminologies are commonly used:

( $\mathit{tp}$ )

true positive means both the prediction and actual class are positive;

( $\mathit{tn}$ )

true negative means both the prediction and actual class are negative;

( $\mathit{fp}$ )

false positive means the prediction is positive but the actual class is negative;

( $\mathit{fn}$ )

false negative means the prediction is negative but the actual class is positive.

These terminologies can be formalized using StatEL as shown in Table 1. For example, when an input $x$ shows true positive at a state $s$ , this can be expressed as $s\models\psi_{\ell}(x)\wedge h_{\ell}(x)$ . True negative, false positive (Type I error), and false negative (Type II error) are respectively expressed as $s\models\neg\psi_{\ell}(x)\wedge\neg h_{\ell}(x)$ , $s\models\psi_{\ell}(x)\wedge\neg h_{\ell}(x)$ , and $s\models\neg\psi_{\ell}(x)\wedge h_{\ell}(x)$ .

Then precision (positive predictive value) is defined as the conditional probability that the prediction is correct given that the prediction is positive; i.e., ${\it precision}=\frac{\mathit{tp}}{\mathit{tp}+\mathit{fp}}$ . Since the test dataset distribution in the real world $\mathit{w_{\sf real}}$ is expressed as $\sigma_{\mathit{w_{\sf real}}}(x)$ (as explained in Section 4.3), the precision being within an interval $I$ is given by:

[TABLE]

which can be written as:

[TABLE]

By using StatEL, this can be formalized as:

[TABLE]

Note that the precision depends on the test data sampled from the distribution $\sigma_{\mathit{w_{\sf real}}}(x)$ , hence on the real world $\mathit{w_{\sf real}}$ in which we are located. Hence the measurement variable $x$ in $\mathsf{Precision}_{\ell,I}(x)$ is interpreted using the stochastic assignment $\sigma_{\mathit{w_{\sf real}}}$ in the world $\mathit{w_{\sf real}}$ .

Symmetrically, recall (true positive rate) is defined as the conditional probability that the prediction is correct given that the actual class is positive; i.e., ${\it recall}=\frac{\mathit{tp}}{\mathit{tp}+\mathit{fn}}$ . Then the recall being within $I$ is formalized as:

[TABLE]

In Table 1 we show the formalization of other notions of classification performance using StatEL.

6 Formalizing the Robustness of Classifiers

Many studies have found attacks on the robustness of statistical machine learning [8]. An input data that violates the robustness of classifiers is called an adversarial example [36]. It is designed to make a classifier fail to predict the actual class $\ell$ , but is recognized to belong to $\ell$ from human eyes. For example, in computer vision, Goodfellow et al. [18] create an image by adding undetectable noise to a panda’s photo so that humans can still recognize the perturbed image as a panda, but a classifier misclassifies it as a gibbon.

In this section we formalize robustness notions for classifiers by using epistemic operators in StatEL (See Fig. 2 for an overview of the formalization). In addition, we present some relationships between classification performance and robustness, which suggest robustness-related properties that have not been formalized in the literature as far as we know.

6.1 Total Correctness of Classifiers

We first note that the total correctness of classifiers could be formalize as a classification performance (e.g., precision, recall, or accuracy) in the presence of all possible inputs from adversaries. For example, the total correctness could be formalized as $\mathfrak{M}\models\mathsf{Recall}_{\ell,I}(x)$ , which represents that $\mathsf{Recall}_{\ell,I}(x)$ is satisfies in all possible worlds of $\mathfrak{M}$ .

In practice, however, it is not possible or tractable to check whether the classification performance is achieved for all possible dataset and for all possible inputs, e.g., when $\mathcal{W}$ is an infinite set. Hence we need a weaker form of correctness notions, which may be tested in a certain way. In the following sections, we deal with robustness notions that are weaker than total correctness.

6.2 Probabilistic Robustness against Targeted Attacks

When a robustness attack aims at misclassifying an input as a specific target label, then it is called a targeted attack. For instance, in the above-mentioned attack by [18], a gibbon is the target into which a panda’s photo is misclassified.

To formalize the robustness, let $\mathcal{R}_{\!\varepsilon}^{\!D}\subseteq\mathcal{W}\times\mathcal{W}$ be an accessibility relation that relates two worlds having closer inputs, i.e.,

[TABLE]

where $D$ is some divergence or distance. Intuitively, $(w,w^{\prime})\in\mathcal{R}_{\!\varepsilon}^{\!D}$ implies that the two distributions $\sigma_{w}(x)$ and $\sigma_{w^{\prime}}(x)$ of inputs to the classifier $C$ represent close datasets in terms of $D$ (e.g., two test datasets consisting of slightly different images that look pandas from the human’ eyes). Then an epistemic formula $\mathop{\mathsf{K}_{\varepsilon}^{\!D}}\varphi$ represents that we are confident that $\varphi$ is true as far as the classifier $C$ classifies the test data that are perturbed by noise of a level $\varepsilon$ or smaller222This usage of modality relies on the fact that the value of the measurement variable $x$ can be different in different possible worlds..

Now we discuss how we formalize robustness using the epistemic operator $\mathop{\mathsf{K}_{\varepsilon}^{\!D}}$ as follows. A first definition of robustness against targeted attacks might be:

[TABLE]

which represents that a panda’s photo $x$ will not be recognized as a gibbon at all after the photo is perturbed by noise. However, this does not express probability or cover the case where the human cannot recognize the perturbed image as a panda, for example, when the image is perturbed by a transformation such as linear displacement, rescaling and rotation [2]. Instead, for some $\delta\in[0,1]$ , we formalize a notion of probabilistic robustness against targeted attacks by:

[TABLE]

Since $L^{p}$ -norms are often regarded as reasonable approximations of human perceptual distances [6], they are used as distance constraints on the perturbation in many researches on targeted attacks (e.g. [36, 18, 6]). To represent the robustness against these attacks in our model, we should take the metric $D$ as the $\infty$ -Wasserstein distance $\mathit{W}_{d}$ ( in terms of the $L^{p}$ metric $d$ ) between the two distributions $\sigma_{w}(x)$ and $\sigma_{w^{\prime}}(x)$ 333A coupling that achieves $\mathit{W}_{d}(\sigma_{w}(x),\sigma_{w^{\prime}}(x))\leq\varepsilon$ provides a transformation of an image in ${\mathtt{supp}}(\sigma_{w}(x))$ to another in ${\mathtt{supp}}(\sigma_{w^{\prime}}(x))$ perturbed by a level $\varepsilon$ of noise..

6.3 Probabilistic Robustness against Non-Targeted Attacks

Next we formalize non-targeted attacks [31, 30] in which adversaries try to misclassify inputs as some arbitrary incorrect labels (i.e., not as a specific label like a gibbon). Compared to targeted attacks, this kind of attacks are easier to mount, but harder to defend.

A notion of probabilistic robustness against non-targeted attacks can be formalized for some $I=[1-\delta,1]$ by:

[TABLE]

Then we derive that $\mathsf{TotalRobust}_{{\sf panda},I}(x)$ implies $\mathsf{TargetRobust}_{{\sf panda},\delta}(x,{\sf gibbon})$ , namely, robustness against non-targeted attacks is not weaker than robustness against targeted attacks.

Next we note that by (6), robustness can be regarded as recall in the presence of perturbed noise. This implies that for each property $\varphi$ in Table 1, we could consider $\mathop{\mathsf{K}_{\varepsilon}^{\!D}}\varphi$ as a property related to robustness although these have not been formalized in the literature of robustness of machine learning as far as we recognize. For example, $\mathop{\mathsf{K}_{\varepsilon}^{\!D}}\mathsf{Precision}_{\ell,i}(x)$ represents that in the presence of perturbed noise, the prediction is correct with a probability $i$ given that it is positive. For another example, $\mathop{\mathsf{K}_{\varepsilon}^{\!D}}\mathsf{Accuracy}_{\ell,i}(x)$ represents that in the presence of perturbed noise, the prediction is correct (whether it is positive or negative) with a probability $i$ .

Finally, note that by the reflexivity of $\mathcal{R}_{\!\varepsilon}^{\!D}$ , $\mathfrak{M},\mathit{w_{\sf real}}\models\mathop{\mathsf{K}_{\varepsilon}^{\!D}}\mathsf{Recall}_{\ell,I}(x)$ implies $\mathfrak{M},\mathit{w_{\sf real}}\models\mathsf{Recall}_{\ell,I}(x)$ , i.e., robustness implies recall without perturbation noise.

7 Formalizing the Fairness of Classifiers

There have been researches on various notions of fairness in machine learning. In this section, we formalize a few notions of fairness of classifiers by using StatEL. Here we focus on the fairness that should be maintained in the impact, i.e., the results of classification, rather than the treatment444For instance, fairness through unawareness requires that protected attributes (e.g., race, religion, or gender) are not explicitly used in the prediction process. However, StatEL may not be suited to formalizing such a property in treatment..

To formalize fairness notions, we use a distributional Kripke model $\mathfrak{M}=(\mathcal{W},(\mathcal{R}_{a})_{a\in\mathcal{A}},\allowbreak(V_{s})_{s\in\mathcal{S}})$ where $\mathcal{W}$ includes a possible world $w_{d}$ having a dataset $d$ from which an input to the classifier $C$ is drawn. Recall that $x$ , $y$ , and $\hat{y}$ are measurement variables denoting the input to the classifier $C$ , the actual class label, and the predicted label by $C$ , respectively. In each world $w$ , $\sigma_{w}(x)$ is the distribution of $C$ ’s input over $\mathcal{D}$ , (i.e., the test data distribution), $\sigma_{w}(y)$ is the distribution of the actual label over $\mathtt{L}$ , and $\sigma_{w}(\hat{y})$ is the distribution of $C$ ’s output over $\mathtt{L}$ . For each group $G\subseteq\mathcal{D}$ of inputs, we introduce a static formula $\eta_{G}(x)$ representing that an input $x$ belongs to $G$ . We also introduce a formula $\xi_{d}$ representing that all data are drawn from some subset of the dataset $d$ . Formally, these are interpreted by:

•

For each state $s\in\mathcal{S}$ , $s\models\eta_{G}(x)$ iff $\sigma_{s}(x)\in G$ ;

•

For each world $w\in\mathcal{W}$ , $w\models\xi_{d}$ iff there exists a $\mathcal{S}^{\prime}\subseteq\mathcal{S}$ s.t. $w[s]=\frac{w_{d}[s]}{\sum_{s^{\prime}\in\mathcal{S}^{\prime}}w_{d}[s^{\prime}]}$ if $s\in\mathcal{S}^{\prime}$ , and $w[s]=0$ otherwise.

For two worlds $w$ and $w^{\prime}$ , we write $w\models\mathop{\mathbb{Q}_{w^{\prime}}}\psi$ to denote that $w\models\mathop{\mathbb{P}_{1}}\psi$ and $s\not\models\psi$ for all $s\in{\mathtt{supp}}(w^{\prime})\setminus{\mathtt{supp}}(w)$ .

Then we obtain the following proposition on conditional indistinguishability.

Proposition 2 (Conditional indistinguishability in a world $w_{d}$ )

Let $\mathfrak{M}=(\mathcal{W},(\mathcal{R}_{a})_{a\in\mathcal{A}},\allowbreak(V_{s})_{s\in\mathcal{S}})$ be a distributional Kripke model with the universe $\mathcal{W}=\mathbb{D}\mathcal{S}$ . Let $w_{d}$ be a world with a dataset $d$ , $\psi$ and $\psi^{\prime}$ be static formulas, and $a\in\mathcal{A}$ .

(i)

$\mathfrak{M},w_{d}\models\psi\supset\neg\overline{\mathop{\mathsf{P}_{\!a}}}\bigl{(}\xi_{d}\wedge\mathop{\mathbb{Q}_{w_{d}}}\psi^{\prime}\bigr{)}$ * iff for any $w,w^{\prime}\in\mathcal{W}$ , $\mathfrak{M},w\models\xi_{d}\wedge\mathop{\mathbb{Q}_{w_{d}}}\psi$ and $\mathfrak{M},w^{\prime}\models\xi_{d}\wedge\mathop{\mathbb{Q}_{w_{d}}}\psi^{\prime}$ imply $(w,w^{\prime})\in\mathcal{R}_{a}$ .* 2. (ii)

If $\mathcal{R}_{a}$ is symmetric, then $\mathfrak{M},w_{d}\models\psi\supset\neg\overline{\mathop{\mathsf{P}_{\!a}}}\mathop{\mathbb{Q}_{w_{d}}}\psi^{\prime}$ iff $\mathfrak{M},w_{d}\models\psi^{\prime}\supset\neg\overline{\mathop{\mathsf{P}_{\!a}}}\mathop{\mathbb{Q}_{w_{d}}}\psi$ .

See Appendix 0.A for the proof.

Now we formalize three popular notions of fairness of classifiers by using counterfactual epistemic operators (introduced in Section 3) as follows.

7.1 Group Fairness (Statistical Parity)

The group fairness formulated as statistical parity [13] is the property that the output distributions of the classifier are identical for different groups. Formally, for each $b=0,1$ and a group $G_{b}\subseteq\mathcal{D}$ , let $\mu_{G_{b}}$ be the distribution of the output (over $\mathtt{L}$ ) of the classifier $C$ when the input is sampled from a dataset $d$ and belongs to $G_{b}$ . Then the statistical parity up to bias $\varepsilon$ is formalized using the total variation $\mathit{D}_{\sf tv}$ by $\mathit{D}_{\sf tv}(\mu_{G_{0}}\|\mu_{G_{1}})\leq\varepsilon$ .

To express this using StatEL, we define an accessibility relation $\mathcal{R}_{\!\varepsilon}^{\sf tv}$ in $\mathfrak{M}$ by:

[TABLE]

Intuitively, $(w,w^{\prime})\in\mathcal{R}_{\!\varepsilon}^{\sf tv}$ represents that the two probability distributions $\sigma_{w}(\hat{y})$ and $\sigma_{w^{\prime}}(\hat{y})$ of the outputs by the classifier $C$ respectively in $w$ and in $w^{\prime}$ are close in terms of $\mathit{D}_{\sf tv}$ . Note that $\sigma_{w}(\hat{y})$ and $\sigma_{w^{\prime}}(\hat{y})$ respectively represent $\mu_{G_{0}}$ and $\mu_{G_{1}}$ .

Then the statistical parity w.r.t. groups $G_{0},G_{1}$ means that in terms of $\mathcal{R}_{\!\varepsilon}^{\sf tv}$ , we cannot distinguish a world having a dataset $d$ and satisfying $\eta_{G_{0}}(x)\wedge\psi(x,\hat{y})$ from another satisfying $\eta_{G_{1}}(x)\wedge\psi(x,\hat{y})$ . By Proposition 2, this is expressed as:

[TABLE]

where $\mathsf{GrpFair}(x,\hat{y})\stackrel{{\scriptstyle\mbox{\scriptsize def}}}{{=}}\bigl{(}\eta_{G_{0}}(x)\wedge\psi(x,\hat{y})\bigr{)}\supset\neg\overline{\mathop{\mathsf{P}_{\varepsilon}^{\sf tv}}}\bigl{(}\xi_{d}\wedge\mathop{\mathbb{Q}_{w_{d}}}(\eta_{G_{1}}(x)\wedge\psi(x,\hat{y}))\bigr{)}$ .

7.2 Individual Fairness (as Lipschitz Property)

The individual fairness formulated as a Lipschitz property [13] is the property that the classifier outputs similar labels given similar inputs. Formally, for $v,v^{\prime}\in\mathcal{D}$ , let $\mu_{v}$ and $\mu_{v^{\prime}}$ be the distributions of the outputs (over $\mathtt{L}$ ) of the classifier $C$ when the inputs are $v$ and $v^{\prime}$ , respectively. Then the individual fairness is formalized using a divergence $D:\mathbb{D}\mathtt{L}\times\mathbb{D}\mathtt{L}\rightarrow\mathbb{R}^{\geq 0}$ , a metric $r:\mathcal{D}\times\mathcal{D}\rightarrow\mathbb{R}^{\geq 0}$ , and a threshold $\varepsilon\in\mathbb{R}^{\geq 0}$ by $\mathit{D}(\mu_{v}\parallel\mu_{v^{\prime}})\leq\varepsilon\cdot r(v,v^{\prime})$ .

To express this using StatEL, we define an accessibility relation $\mathcal{R}_{\!\varepsilon}^{r,D}$ in $\mathfrak{M}$ for the metric $r$ and the divergence $D$ as follows:

[TABLE]

Intuitively, $(w,w^{\prime})\in\mathcal{R}_{\!\varepsilon}^{r,D}$ represents that, when inputs are closer in terms of the metric $r$ , the classifier $C$ outputs closer labels in terms of the divergence $D$ .

Then the individual fairness w.r.t. $r$ and $D$ means that in terms of $\mathcal{R}_{\!\varepsilon}^{r,D}$ , we cannot distinguish between the two worlds $w$ and $w^{\prime}$ where $\psi(x,\hat{y})$ is satisfied (i.e., $C$ outputs $\hat{y}$ given an input $x$ ). By Proposition 2, this is expressed as:

[TABLE]

where $\mathsf{IndFair}(x,\hat{y})\stackrel{{\scriptstyle\mbox{\scriptsize def}}}{{=}}\psi(x,\hat{y})\supset\neg\overline{\mathop{\mathsf{P}_{\varepsilon}^{r,D}}}\bigl{(}\xi_{d}\wedge\mathop{\mathbb{Q}_{w_{d}}}\psi(x,\hat{y})\bigr{)}$ .

This represents that by observing the classifier’s output $\hat{y}$ , we can less distinguish two worlds $w$ and $w^{\prime}$ when their inputs $\sigma_{w}(x)$ and $\sigma_{w^{\prime}}(x)$ are closer.

7.3 Equal Opportunity

Equal opportunity [21, 40] is the property that the recall (true positive rate) is the same for all the groups. Formally, given an advantage class $\ell\in\mathtt{L}$ (e.g., not defaulting on a loan) and a group $G\subseteq\mathcal{D}$ of inputs with a protected attribute (e.g., race), a classifier $C$ is said to satisfy equal opportunity of $\ell$ w.r.t. $G$ if it holds for each $\hat{\ell}\in\mathtt{L}$ that:

[TABLE]

If we allow the logic to use the universal quantification over the probability value $i$ , then the case of $\hat{\ell}=\ell$ in (11) could be expressed as:

[TABLE]

However, instead of allowing for this universal quantification, we can use the modal operators $\overline{\mathop{\mathsf{P}_{\varepsilon}^{\sf tv}}}$ (defined by (7)) with $\varepsilon=0$ , and represent equal opportunity as the fact that we cannot distinguish a world having a dataset $d$ and satisfying $\eta_{G}(x)\wedge\psi(x,\hat{y})\wedge h_{\ell}(x)$ from another satisfying $\neg\eta_{G}(x)\wedge\psi(x,\hat{y})\wedge h_{\ell}(x)$ as follows:

[TABLE]

8 Related Work

In this section, we provide a brief overview of related work on the specification of statistical machine learning and on epistemic logic for describing specification.

Desirable properties of statistical machine learning.

There have been a large number of papers on attacks and defences for deep neural networks [36, 8]. Compared to them, however, not much work has been done to explore the formal specification of various properties of machine learning. Seshia et al. [34] present a list of desirable properties of DNNs (deep neural networks) although most of the properties are presented informally without mathematical formulas. As for robustness, Dreossi et al. [11] propose a unifying formalization of adversarial input generation in a rigorous and organized manner, although they formalize and classify attacks (as optimization problems) rather than define the robustness notions themselves. Concerning the fairness notions, Gajane [16] surveys the formalization of fairness notions for machine learning and present some justification based on social science literature.

Epistemic logic for describing specification.

Epistemic logic [39] has been studied to represent and reason about knowledge [14, 19, 20], and has been applied to describe various properties of systems.

The BAN logic [5], proposed by Burrows, Abadi and Needham, is a notable example of epistemic logic used to model and verify the authentication in cryptographic protocols. To improve the formalization of protocols’ behaviours, some epistemic approaches integrate process calculi [23, 10, 7].

Epistemic logic has also been used to formalize and reason about privacy properties, including anonymity [35, 20, 17, 27], receipt-freeness of electronic voting protocols [24], and privacy policy for social network services [32]. Temporal epistemic logic is used to express information flow security policies [3].

Concerning the formalization of fairness notions, previous work in formal methods has modeled different kinds of fairness involving timing by using temporal logic rather than epistemic logic. As far as we know, no previous work has formalized fairness notions of machine learning using counterfactual epistemic operators.

Formalization of statistical properties.

In studies of philosophical logic, Lewis [29] shows the idea that when a random value has various possible probability distributions, then those distributions should be represented on distinct possible worlds. Bana [4] puts Lewis’s idea in a mathematically rigorous setting. Recently, a modal logic called statistical epistemic logic [26] is proposed and is used to formalize statistical hypothesis testing and the notion of differential privacy [12]. Independently of that work, French et al. [15] propose a probability model for a dynamic epistemic logic in which each world is associated with a subjective probability distribution over the universe, without dealing with non-deterministic inputs or statistical divergence.

9 Conclusion

We have shown a logical approach to formalizing statistical classifiers and their desirable properties in a simple and abstract way. Specifically, we have introduced a formal model for probabilistic behaviours of classifiers and non-deterministic adversarial inputs using a distributional Kripke model. Then we have formalized the classification performance, robustness, and fairness of classifiers by using StatEL. Moreover, we have also clarified some relationships among properties of classifiers, and relevance between classification performance and robustness. To formalize fairness notions, we have introduced a notion of counterfactual knowledge and shown some techniques to express conditional indistinguishability. As far as we know, this is the first work that uses logical formulas to express statistical properties of machine learning, and that provides epistemic (resp. counterfactually epistemic) views on robustness (resp. fairness) of classifiers.

In future work, we are planning to include temporal operators in the specification language and to formally reason about system-level properties of learning-based systems. We are also interested in developing a general framework for the formal specification of machine learning associated with testing methods and possibly extended with Bayesian networks. Our future work also includes an extension of StatEL to formalize machine learning other than classification problems. Another possible direction of future work would be to clarify the relationships between our counterfactual epistemic operators and more general notions of counterfactual knowledge in previous work such as [38].

Appendix 0.A Proofs for Propositions 1 and 2

See 1

Proof

We first prove the claim (i) as follows. We show the direction from left to right. Assume that $\mathfrak{M}\models\psi\supset\neg\overline{\mathop{\mathsf{P}_{\!a}}}\mathop{\mathbb{P}_{1}}\psi^{\prime}$ . Let $w,w^{\prime}\in\mathcal{W}$ satisfy $\mathfrak{M},w\models\mathop{\mathbb{P}_{1}}\psi$ and $\mathfrak{M},w^{\prime}\models\mathop{\mathbb{P}_{1}}\psi^{\prime}$ . Then $w|_{\psi}=w$ . By $\mathfrak{M},w\models\psi\supset\neg\overline{\mathop{\mathsf{P}_{\!a}}}\mathop{\mathbb{P}_{1}}\psi^{\prime}$ , we obtain $\mathfrak{M},w|_{\psi}\models\neg\overline{\mathop{\mathsf{P}_{\!a}}}\mathop{\mathbb{P}_{1}}\psi^{\prime}$ , which is logically equivalent to $\mathfrak{M},w|_{\psi}\models\overline{\mathop{\mathsf{K}_{a}}}\neg\mathop{\mathbb{P}_{1}}\psi^{\prime}$ . By the definition of $\overline{\mathop{\mathsf{K}_{a}}}$ , for every $w^{\prime\prime}\in\mathcal{W}$ , $\mathfrak{M},w^{\prime\prime}\models\mathop{\mathbb{P}_{1}}\psi^{\prime}$ implies $(w|_{\psi},w^{\prime\prime})\in\mathcal{R}_{a}$ . Then, since $w|_{\psi}=w$ and $\mathfrak{M},w^{\prime}\models\mathop{\mathbb{P}_{1}}\psi^{\prime}$ , we obtain $(w,w^{\prime})\in\mathcal{R}_{a}$ .

Next we show the other direction as follows. Assume the right hand side. Let $w\in\mathcal{W}$ such that $\mathfrak{M},w\models\mathop{\mathbb{P}_{1}}\psi$ . Then for every $w^{\prime}\in\mathcal{W}$ , $\mathfrak{M},w^{\prime}\models\mathop{\mathbb{P}_{1}}\psi^{\prime}$ implies $(w,w^{\prime})\in\mathcal{R}_{a}$ . By the definition of $\overline{\mathop{\mathsf{K}_{a}}}$ , we have $\mathfrak{M},w\models\overline{\mathop{\mathsf{K}_{a}}}\neg\mathop{\mathbb{P}_{1}}\psi^{\prime}$ , which is equivalent to $\mathfrak{M},w\models\neg\overline{\mathop{\mathsf{P}_{\!a}}}\mathop{\mathbb{P}_{1}}\psi^{\prime}$ . By $\mathfrak{M},w\models\mathop{\mathbb{P}_{1}}\psi$ , we have $w|_{\psi}=w$ , hence $\mathfrak{M},w|_{\psi}\models\neg\overline{\mathop{\mathsf{P}_{\!a}}}\mathop{\mathbb{P}_{1}}\psi^{\prime}$ . Therefore $\mathfrak{M},w\models\psi\supset\neg\overline{\mathop{\mathsf{P}_{\!a}}}\mathop{\mathbb{P}_{1}}\psi^{\prime}$ .

Finally, the claim (ii) follows from the claim (i) immediately. ∎

See 2

Proof

We first prove the claim (i) as follows. We show the direction from left to right. Assume that $\mathfrak{M},w_{d}\models\psi\supset\neg\overline{\mathop{\mathsf{P}_{\!a}}}\bigl{(}\xi_{d}\wedge\mathop{\mathbb{Q}_{w_{d}}}\psi^{\prime}\bigr{)}$ . Let $w,w^{\prime}\in\mathcal{W}$ satisfy $\mathfrak{M},w\models\xi_{d}\wedge\mathop{\mathbb{Q}_{w_{d}}}\psi$ and $\mathfrak{M},w^{\prime}\models\xi_{d}\wedge\mathop{\mathbb{Q}_{w_{d}}}\psi^{\prime}$ . Then $w_{d}|_{\psi}=w$ and $w_{d}|_{\psi^{\prime}}=w^{\prime}$ . By $\mathfrak{M},w_{d}\models\psi\supset\neg\overline{\mathop{\mathsf{P}_{\!a}}}\bigl{(}\xi_{d}\wedge\mathop{\mathbb{Q}_{w_{d}}}\psi^{\prime}\bigr{)}$ , we obtain $\mathfrak{M},w_{d}|_{\psi}\models\neg\overline{\mathop{\mathsf{P}_{\!a}}}\bigl{(}\xi_{d}\wedge\mathop{\mathbb{Q}_{w_{d}}}\psi^{\prime}\bigr{)}$ , which is logically equivalent to $\mathfrak{M},w_{d}|_{\psi}\models\overline{\mathop{\mathsf{K}_{a}}}\neg\bigl{(}\xi_{d}\wedge\mathop{\mathbb{Q}_{w_{d}}}\psi^{\prime}\bigr{)}$ . By the definition of $\overline{\mathop{\mathsf{K}_{a}}}$ and $\mathfrak{M},w^{\prime}\models\xi_{d}\wedge\mathop{\mathbb{Q}_{w_{d}}}\psi^{\prime}$ , we have $(w_{d}|_{\psi},w^{\prime})\in\mathcal{R}_{a}$ . Therefore, by $w|_{\psi}=w$ , we obtain $(w,w^{\prime})\in\mathcal{R}_{a}$ .

Next we show the other direction as follows. Assume the right hand side. Let $w\in\mathcal{W}$ such that $\mathfrak{M},w\models\xi_{d}\wedge\mathop{\mathbb{Q}_{w_{d}}}\psi$ . Then for every $w^{\prime}\in\mathcal{W}$ , $\mathfrak{M},w^{\prime}\models\xi_{d}\wedge\mathop{\mathbb{Q}_{w_{d}}}\psi^{\prime}$ implies $(w,w^{\prime})\in\mathcal{R}_{a}$ . By the definition of $\overline{\mathop{\mathsf{K}_{a}}}$ , we have $\mathfrak{M},w\models\overline{\mathop{\mathsf{K}_{a}}}\neg\bigl{(}\xi_{d}\wedge\mathop{\mathbb{Q}_{w_{d}}}\psi^{\prime}\bigr{)}$ , which is equivalent to $\mathfrak{M},w\models\neg\overline{\mathop{\mathsf{P}_{\!a}}}\bigl{(}\xi_{d}\wedge\mathop{\mathbb{Q}_{w_{d}}}\psi^{\prime}\bigr{)}$ . By $\mathfrak{M},w\models\xi_{d}\wedge\mathop{\mathbb{Q}_{w_{d}}}\psi$ , we have $w_{d}|_{\psi}=w$ , hence $\mathfrak{M},w_{d}|_{\psi}\models\neg\overline{\mathop{\mathsf{P}_{\!a}}}\bigl{(}\xi_{d}\wedge\mathop{\mathbb{Q}_{w_{d}}}\psi^{\prime}\bigr{)}$ . Therefore $\mathfrak{M},w_{d}\models\psi\supset\neg\overline{\mathop{\mathsf{P}_{\!a}}}\bigl{(}\xi_{d}\wedge\mathop{\mathbb{Q}_{w_{d}}}\psi^{\prime}\bigr{)}$ .

Finally, the claim (ii) follows from the claim (i) immediately. ∎

Bibliography40

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Alpern, B., Schneider, F.B.: Defining liveness. Inf. Process. Lett. 21 (4), 181–185 (1985). https://doi.org/10.1016/0020-0190(85)90056-0
2[2] Athalye, A., Engstrom, L., Ilyas, A., Kwok, K.: Synthesizing robust adversarial examples. In: Proc. ICML. pp. 284–293 (2018)
3[3] Balliu, M., Dam, M., Guernic, G.L.: Epistemic temporal logic for information flow security. In: Proc. of PLAS. p. 6 (2011). https://doi.org/10.1145/2166956.2166962
4[4] Bana, G.: Models of objective chance: An analysis through examples. In: Making it Formally Explicit. pp. 43–60. Springer International Publishing (2017). https://doi.org/10.1007/978-3-319-55486-0_3
5[5] Burrows, M., Abadi, M., Needham, R.M.: A logic of authentication. ACM Trans. Comput. Syst. 8 (1), 18–36 (1990). https://doi.org/10.1145/77648.77649
6[6] Carlini, N., Wagner, D.A.: Towards evaluating the robustness of neural networks. In: Prc. S&P. pp. 39–57 (2017). https://doi.org/10.1109/SP.2017.49
7[7] Chadha, R., Delaune, S., Kremer, S.: Epistemic logic for the applied pi calculus. In: Proc. of FMOODS/FORTE. pp. 182–197 (2009). https://doi.org/10.1007/978-3-642-02138-1_12
8[8] Chakraborty, A., Alam, M., Dey, V., Chattopadhyay, A., Mukhopadhyay, D.: Adversarial attacks and defences: A survey. Co RR abs/1810.00069 (2018), http://arxiv.org/abs/1810.00069

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Towards Logical Specification of Statistical Machine Learning

Abstract

Keywords:

1 Introduction

Our contributions.

Cautions and limitations.

Paper organization.

2 Preliminaries

2.1 Notations

2.2 Syntax of StatEL

2.3 Distributional Kripke Model

Definition 1 (Distributional Kripke model)

2.4 Stochastic Semantics of StatEL

3 Techniques for Conditional Indistinguishability

3.1 Counterfactual Epistemic Operators

3.2 Conditional Indistinguishability via Counterfactual Knowledge

Proposition 1** (Conditional indistinguishability)**

4 Formal Model for Statistical Classification

4.1 Statistical Classification Problems

4.2 Modeling the Behaviours of Classifiers

4.3 Modeling the Non-deterministic Inputs from Adversaries

5 Formalizing the Classification Performance

6 Formalizing the Robustness of Classifiers

6.1 Total Correctness of Classifiers

6.2 Probabilistic Robustness against Targeted Attacks

6.3 Probabilistic Robustness against Non-Targeted Attacks

7 Formalizing the Fairness of Classifiers

Proposition 2** (Conditional indistinguishability in a world wdw_{d}wd​)**

7.1 Group Fairness (Statistical Parity)

7.2 Individual Fairness (as Lipschitz Property)

7.3 Equal Opportunity

8 Related Work

Desirable properties of statistical machine learning.

Epistemic logic for describing specification.

Formalization of statistical properties.

9 Conclusion

Appendix 0.A Proofs for Propositions 1 and 2

Proof

Proof

Proposition 1 (Conditional indistinguishability)

Proposition 2 (Conditional indistinguishability in a world $w_{d}$ )