Iterated Belief Revision Under Resource Constraints: Logic as Geometry

Dan P. Guralnik; Daniel E. Koditschek

arXiv:1812.08313·cs.AI·December 21, 2018

Iterated Belief Revision Under Resource Constraints: Logic as Geometry

Dan P. Guralnik, Daniel E. Koditschek

PDF

Open Access

TL;DR

This paper introduces the universal memory architecture (UMA), a geometry-based belief revision method for resource-constrained settings like mobile robots, offering computational efficiency and model comparison capabilities.

Contribution

It develops the formalism of UMA, linking inference to geometry via duality, and analyzes its complexity, learning guarantees, and practical viability through simulations.

Findings

01

UMA reduces computational costs in belief revision.

02

The duality framework enables model space comparisons.

03

Simulation results demonstrate UMA's practical effectiveness.

Abstract

We propose a variant of iterated belief revision designed for settings with limited computational resources, such as mobile autonomous robots. The proposed memory architecture---called the {\em universal memory architecture} (UMA)---maintains an epistemic state in the form of a system of default rules similar to those studied by Pearl and by Goldszmidt and Pearl (systems $Z$ and $Z^{+}$ ). A duality between the category of UMA representations and the category of the corresponding model spaces, extending the Sageev-Roller duality between discrete poc sets and discrete median algebras provides a two-way dictionary from inference to geometry, leading to immense savings in computation, at a cost in the quality of representation that can be quantified in terms of topological invariants. Moreover, the same framework naturally enables comparisons between different model spaces, making it possible…

Figures18

Click any figure to enlarge with its caption.

Equations157

Σ (A) := {⊥, ⊤} \cup a \in A ⋃ {a, \neg a}

Σ (A) := {⊥, ⊤} \cup a \in A ⋃ {a, \neg a}

S is coherent \Leftrightarrow S \cap S^{*} ↓= \emptyset \Leftrightarrow S ↑ \cap S^{*} = \emptyset \Leftrightarrow S ↑ \cap S^{*} ↓= \emptyset,

S is coherent \Leftrightarrow S \cap S^{*} ↓= \emptyset \Leftrightarrow S ↑ \cap S^{*} = \emptyset \Leftrightarrow S ↑ \cap S^{*} ↓= \emptyset,

\pi_{G}(a):=\left\{\begin{array}[]{ll}[a]_{G}&\text{if }a\notin N(G)\cup N(G){{}^{\scriptscriptstyle\ast}}\\ N(G)&\text{if }a\in N(G)\\ N(G){{}^{\scriptscriptstyle\ast}}&\text{if }a\in N(G){{}^{\scriptscriptstyle\ast}}\end{array}\right.

\pi_{G}(a):=\left\{\begin{array}[]{ll}[a]_{G}&\text{if }a\notin N(G)\cup N(G){{}^{\scriptscriptstyle\ast}}\\ N(G)&\text{if }a\in N(G)\\ N(G){{}^{\scriptscriptstyle\ast}}&\text{if }a\in N(G){{}^{\scriptscriptstyle\ast}}\end{array}\right.

h (a; G) := {u \in G^{\circ} ∣ a \in u}, a \in G .

h (a; G) := {u \in G^{\circ} ∣ a \in u}, a \in G .

h (S; G) := {u \in G^{\circ} ∣ S \subseteq u} = a \in S ⋂ h (a; G), S \subset Σ

h (S; G) := {u \in G^{\circ} ∣ S \subseteq u} = a \in S ⋂ h (a; G), S \subset Σ

med (u, v, w) = (u \cap v) \cup (u \cap w) \cup (v \cap w), u, v, w \in G^{\circ},

med (u, v, w) = (u \cap v) \cup (u \cap w) \cup (v \cap w), u, v, w \in G^{\circ},

I (u, v) = {w \in G^{\circ} ∣ u \cap v \subseteq w} = {w \in G^{\circ} ∣ w \subseteq u \cup v} .

I (u, v) = {w \in G^{\circ} ∣ u \cap v \subseteq w} = {w \in G^{\circ} ∣ w \subseteq u \cup v} .

proj_{K} (L) = (S \cup T) ↑ ∖ (T ↑)^{*} = (S ↑ ∖ T ↑^{*}) \cup coh_{G} (T),

proj_{K} (L) = (S \cup T) ↑ ∖ (T ↑)^{*} = (S ↑ ∖ T ↑^{*}) \cup coh_{G} (T),

PROP (T, S; G) := (S \cup T) ↑ ∖ T^{*} ↓ .

PROP (T, S; G) := (S \cup T) ↑ ∖ T^{*} ↓ .

ab \in G ⟺ w_{ab^{*}} is negligible in comparison with w_{ab}, w_{a^{*} b^{*}},

ab \in G ⟺ w_{ab^{*}} is negligible in comparison with w_{ab}, w_{a^{*} b^{*}},

\mathtt{Curr}\big{|}_{\scriptscriptstyle{t+1}}:=\mathtt{coh}_{G}(\mathtt{Obs}\big{|}_{\scriptscriptstyle{t+1}})\subseteq\mathtt{Obs}\big{|}_{\scriptscriptstyle{t+1}}\,,

\mathtt{Curr}\big{|}_{\scriptscriptstyle{t+1}}:=\mathtt{coh}_{G}(\mathtt{Obs}\big{|}_{\scriptscriptstyle{t+1}})\subseteq\mathtt{Obs}\big{|}_{\scriptscriptstyle{t+1}}\,,

Δ (A, G^{\circ}) := u \in G^{\circ} min Δ (A, u)

Δ (A, G^{\circ}) := u \in G^{\circ} min Δ (A, u)

\delta_{u,r}(F):=\left\{\begin{array}[]{cl}r&\text{if }u\in F\\ \infty&u\notin F\,.\end{array}\right.

\delta_{u,r}(F):=\left\{\begin{array}[]{cl}r&\text{if }u\in F\\ \infty&u\notin F\,.\end{array}\right.

ab\in\mathbf{Res}\!\left(\mathtt{w}_{{\scriptscriptstyle\bullet}};\delta\right)\;\stackrel{{\scriptstyle\begin{array}[]{c}\scriptscriptstyle{def.}\\ \end{array}}}{{\Longleftrightarrow}}\{a,a{{}^{\scriptscriptstyle\ast}}\}\cap\{b,b{{}^{\scriptscriptstyle\ast}}\}=\varnothing\,\text{ and }\,\mathtt{w}_{ab{{}^{\scriptscriptstyle\ast}}}>\delta

ab\in\mathbf{Res}\!\left(\mathtt{w}_{{\scriptscriptstyle\bullet}};\delta\right)\;\stackrel{{\scriptstyle\begin{array}[]{c}\scriptscriptstyle{def.}\\ \end{array}}}{{\Longleftrightarrow}}\{a,a{{}^{\scriptscriptstyle\ast}}\}\cap\{b,b{{}^{\scriptscriptstyle\ast}}\}=\varnothing\,\text{ and }\,\mathtt{w}_{ab{{}^{\scriptscriptstyle\ast}}}>\delta

ab\in\mathbf{Res}\!\left(\mathtt{w}_{{\scriptscriptstyle\bullet}};\infty\right)\;\stackrel{{\scriptstyle\begin{array}[]{c}\scriptscriptstyle{def.}\\ \end{array}}}{{\Longleftrightarrow}}\ \{a,a{{}^{\scriptscriptstyle\ast}}\}\cap\{b,b{{}^{\scriptscriptstyle\ast}}\}=\varnothing\,,\text{ and }\,\mathtt{w}_{ab{{}^{\scriptscriptstyle\ast}}}=\infty\,.

ab\in\mathbf{Res}\!\left(\mathtt{w}_{{\scriptscriptstyle\bullet}};\infty\right)\;\stackrel{{\scriptstyle\begin{array}[]{c}\scriptscriptstyle{def.}\\ \end{array}}}{{\Longleftrightarrow}}\ \{a,a{{}^{\scriptscriptstyle\ast}}\}\cap\{b,b{{}^{\scriptscriptstyle\ast}}\}=\varnothing\,,\text{ and }\,\mathtt{w}_{ab{{}^{\scriptscriptstyle\ast}}}=\infty\,.

w (u) := a, b \in u max w_{ab},

w (u) := a, b \in u max w_{ab},

ab\in\mathbf{Der}\!\left(\mathtt{w}_{{\scriptscriptstyle\bullet}};\delta\right)\Leftrightarrow\left\{\begin{array}[]{l}\{a,a{{}^{\scriptscriptstyle\ast}}\}\cap\{b,b{{}^{\scriptscriptstyle\ast}}\}=\varnothing\,,\text{ and }\\[2.5pt] \mathtt{w}_{ab{{}^{\scriptscriptstyle\ast}}}=\infty\text{ or }\mathtt{w}_{ab{{}^{\scriptscriptstyle\ast}}}>\delta+\max(\mathtt{w}_{ab},\mathtt{w}_{a{{}^{\scriptscriptstyle\ast}}b{{}^{\scriptscriptstyle\ast}}})\,,\end{array}\right.

ab\in\mathbf{Der}\!\left(\mathtt{w}_{{\scriptscriptstyle\bullet}};\delta\right)\Leftrightarrow\left\{\begin{array}[]{l}\{a,a{{}^{\scriptscriptstyle\ast}}\}\cap\{b,b{{}^{\scriptscriptstyle\ast}}\}=\varnothing\,,\text{ and }\\[2.5pt] \mathtt{w}_{ab{{}^{\scriptscriptstyle\ast}}}=\infty\text{ or }\mathtt{w}_{ab{{}^{\scriptscriptstyle\ast}}}>\delta+\max(\mathtt{w}_{ab},\mathtt{w}_{a{{}^{\scriptscriptstyle\ast}}b{{}^{\scriptscriptstyle\ast}}})\,,\end{array}\right.

ab\in\mathbf{Der}\!\left(\mathtt{w}_{{\scriptscriptstyle\bullet}};\infty\right)\Leftrightarrow\left\{\begin{array}[]{l}\{a,a{{}^{\scriptscriptstyle\ast}}\}\cap\{b,b{{}^{\scriptscriptstyle\ast}}\}=\varnothing\,,\text{ and }\\[2.5pt] \mathtt{w}_{ab{{}^{\scriptscriptstyle\ast}}}=\infty\text{ and }\mathtt{w}_{ab},\mathtt{w}_{a{{}^{\scriptscriptstyle\ast}}b{{}^{\scriptscriptstyle\ast}}}<\infty\,.\end{array}\right.

ab\in\mathbf{Der}\!\left(\mathtt{w}_{{\scriptscriptstyle\bullet}};\infty\right)\Leftrightarrow\left\{\begin{array}[]{l}\{a,a{{}^{\scriptscriptstyle\ast}}\}\cap\{b,b{{}^{\scriptscriptstyle\ast}}\}=\varnothing\,,\text{ and }\\[2.5pt] \mathtt{w}_{ab{{}^{\scriptscriptstyle\ast}}}=\infty\text{ and }\mathtt{w}_{ab},\mathtt{w}_{a{{}^{\scriptscriptstyle\ast}}b{{}^{\scriptscriptstyle\ast}}}<\infty\,.\end{array}\right.

\mathtt{M}(\mathtt{w}_{{\scriptscriptstyle\bullet}};\epsilon):=\left\{a\in\mathbf{\Sigma}\,\big{|}\,\mathtt{w}_{a}<\mathtt{w}_{a{{}^{\scriptscriptstyle\ast}}}-\epsilon\right\}

\mathtt{M}(\mathtt{w}_{{\scriptscriptstyle\bullet}};\epsilon):=\left\{a\in\mathbf{\Sigma}\,\big{|}\,\mathtt{w}_{a}<\mathtt{w}_{a{{}^{\scriptscriptstyle\ast}}}-\epsilon\right\}

\mathtt{M}(\kappa;\epsilon):=\left\{a\in\mathbf{\Sigma}\,\big{|}\,\mathtt{w}^{\kappa}_{a}<\mathtt{w}^{\kappa}_{a{{}^{\scriptscriptstyle\ast}}}-\epsilon\right\}\,,

\mathtt{M}(\kappa;\epsilon):=\left\{a\in\mathbf{\Sigma}\,\big{|}\,\mathtt{w}^{\kappa}_{a}<\mathtt{w}^{\kappa}_{a{{}^{\scriptscriptstyle\ast}}}-\epsilon\right\}\,,

\mathtt{w}\big{|}_{\scriptscriptstyle{0}}=\delta_{\mathtt{Obs}\big{|}_{\scriptscriptstyle{0}},\varphi(\mathtt{Obs}\big{|}_{\scriptscriptstyle{0}})}^{\scriptscriptstyle{(2)}}\,,\quad\mathtt{w}\big{|}_{\scriptscriptstyle{t+1}}:=\min\left\{\mathtt{w}\big{|}_{\scriptscriptstyle{t}},\delta_{\mathtt{Obs}\big{|}_{\scriptscriptstyle{t+1}},\varphi(\mathtt{Obs}\big{|}_{\scriptscriptstyle{t+1}})}^{\scriptscriptstyle{(2)}}\right\}\,.

\mathtt{w}\big{|}_{\scriptscriptstyle{0}}=\delta_{\mathtt{Obs}\big{|}_{\scriptscriptstyle{0}},\varphi(\mathtt{Obs}\big{|}_{\scriptscriptstyle{0}})}^{\scriptscriptstyle{(2)}}\,,\quad\mathtt{w}\big{|}_{\scriptscriptstyle{t+1}}:=\min\left\{\mathtt{w}\big{|}_{\scriptscriptstyle{t}},\delta_{\mathtt{Obs}\big{|}_{\scriptscriptstyle{t+1}},\varphi(\mathtt{Obs}\big{|}_{\scriptscriptstyle{t+1}})}^{\scriptscriptstyle{(2)}}\right\}\,.

G\big{|}_{\scriptscriptstyle{t}}:=\mathbf{Der}\!\left(\mathtt{w}_{{\scriptscriptstyle\bullet}}\big{|}_{\scriptscriptstyle{t}}\right)\,.

G\big{|}_{\scriptscriptstyle{t}}:=\mathbf{Der}\!\left(\mathtt{w}_{{\scriptscriptstyle\bullet}}\big{|}_{\scriptscriptstyle{t}}\right)\,.

\mathtt{w}_{ab}=r\cdot\delta_{u}(\mathfrak{h}(ab))\,,\quad\delta_{u}(F):=\left\{\begin{array}[]{cl}1&\text{if }u\in F\\ 0&\text{if }u\notin F\,,\end{array}\right.

\mathtt{w}_{ab}=r\cdot\delta_{u}(\mathfrak{h}(ab))\,,\quad\delta_{u}(F):=\left\{\begin{array}[]{cl}1&\text{if }u\in F\\ 0&\text{if }u\notin F\,,\end{array}\right.

\tau_{ab}\big{|}_{\scriptscriptstyle{t}}=\tau_{ba}\big{|}_{\scriptscriptstyle{t}}=\tau_{a{{}^{\scriptscriptstyle\ast}}b}\big{|}_{\scriptscriptstyle{t}}

\tau_{ab}\big{|}_{\scriptscriptstyle{t}}=\tau_{ba}\big{|}_{\scriptscriptstyle{t}}=\tau_{a{{}^{\scriptscriptstyle\ast}}b}\big{|}_{\scriptscriptstyle{t}}

ab\in\mathbf{Der}\!\left(\mathtt{w}_{{\scriptscriptstyle\bullet}},\tau_{\scriptscriptstyle\bullet}\right)\big{|}_{\scriptscriptstyle{t}}\;\stackrel{{\scriptstyle\begin{array}[]{c}\scriptscriptstyle{def.}\\ \end{array}}}{{\Longleftrightarrow}}\left\{\begin{array}[]{c}\mathtt{w}_{ab{{}^{\scriptscriptstyle\ast}}}\big{|}_{\scriptscriptstyle{t}}<min(\tau_{ab}\cdot\mathtt{w}_{\varnothing},\mathtt{w}_{ab},\mathtt{w}_{a{{}^{\scriptscriptstyle\ast}}b{{}^{\scriptscriptstyle\ast}}},\mathtt{w}_{a{{}^{\scriptscriptstyle\ast}}b})\\[2.5pt] \texttt{or}\\[2.5pt] \mathtt{w}_{ab{{}^{\scriptscriptstyle\ast}}}=\mathtt{w}_{a{{}^{\scriptscriptstyle\ast}}b}=0\,.\end{array}\right.

ab\in\mathbf{Der}\!\left(\mathtt{w}_{{\scriptscriptstyle\bullet}},\tau_{\scriptscriptstyle\bullet}\right)\big{|}_{\scriptscriptstyle{t}}\;\stackrel{{\scriptstyle\begin{array}[]{c}\scriptscriptstyle{def.}\\ \end{array}}}{{\Longleftrightarrow}}\left\{\begin{array}[]{c}\mathtt{w}_{ab{{}^{\scriptscriptstyle\ast}}}\big{|}_{\scriptscriptstyle{t}}<min(\tau_{ab}\cdot\mathtt{w}_{\varnothing},\mathtt{w}_{ab},\mathtt{w}_{a{{}^{\scriptscriptstyle\ast}}b{{}^{\scriptscriptstyle\ast}}},\mathtt{w}_{a{{}^{\scriptscriptstyle\ast}}b})\\[2.5pt] \texttt{or}\\[2.5pt] \mathtt{w}_{ab{{}^{\scriptscriptstyle\ast}}}=\mathtt{w}_{a{{}^{\scriptscriptstyle\ast}}b}=0\,.\end{array}\right.

M (w_{∙}) := {a \in Σ ∣ w_{a} > w_{a^{*}}},

M (w_{∙}) := {a \in Σ ∣ w_{a} > w_{a^{*}}},

\left\{\begin{array}[]{rcl}\mathtt{w}_{ab}\big{|}_{\scriptscriptstyle{0}}&:=&\varphi\big{|}_{\scriptscriptstyle{0}}\cdot\delta_{u_{0}}(\mathfrak{h}(ab))\,,\\[5.0pt] \mathtt{w}_{ab}\big{|}_{\scriptscriptstyle{t+1}}&:=&q\big{|}_{\scriptscriptstyle{t}}\cdot\mathtt{w}_{ab}\big{|}_{\scriptscriptstyle{t}}+(1-q\big{|}_{\scriptscriptstyle{t}})\cdot\varphi\big{|}_{\scriptscriptstyle{t+1}}\cdot\delta_{u_{t+1}}(\mathfrak{h}(ab))\,,\end{array}\right.

\left\{\begin{array}[]{rcl}\mathtt{w}_{ab}\big{|}_{\scriptscriptstyle{0}}&:=&\varphi\big{|}_{\scriptscriptstyle{0}}\cdot\delta_{u_{0}}(\mathfrak{h}(ab))\,,\\[5.0pt] \mathtt{w}_{ab}\big{|}_{\scriptscriptstyle{t+1}}&:=&q\big{|}_{\scriptscriptstyle{t}}\cdot\mathtt{w}_{ab}\big{|}_{\scriptscriptstyle{t}}+(1-q\big{|}_{\scriptscriptstyle{t}})\cdot\varphi\big{|}_{\scriptscriptstyle{t+1}}\cdot\delta_{u_{t+1}}(\mathfrak{h}(ab))\,,\end{array}\right.

\mathtt{w}_{ab}\big{|}_{\scriptscriptstyle{t}}=\frac{1}{t}\sum_{s=0}^{t}\varphi\big{|}_{\scriptscriptstyle{s}}\cdot\delta_{u(s)}(\mathfrak{h}(ab))\,,

\mathtt{w}_{ab}\big{|}_{\scriptscriptstyle{t}}=\frac{1}{t}\sum_{s=0}^{t}\varphi\big{|}_{\scriptscriptstyle{s}}\cdot\delta_{u(s)}(\mathfrak{h}(ab))\,,

\mathtt{w}_{ab}\big{|}_{\scriptscriptstyle{t}}=q^{t}\varphi\big{|}_{\scriptscriptstyle{0}}\cdot\delta_{u(0)}(\mathfrak{h}(ab))+(1-q)\sum_{s=1}^{t}q^{t-s}\varphi\big{|}_{\scriptscriptstyle{s}}\cdot\delta_{u(s)}(\mathfrak{h}(ab))\,.

\mathtt{w}_{ab}\big{|}_{\scriptscriptstyle{t}}=q^{t}\varphi\big{|}_{\scriptscriptstyle{0}}\cdot\delta_{u(0)}(\mathfrak{h}(ab))+(1-q)\sum_{s=1}^{t}q^{t-s}\varphi\big{|}_{\scriptscriptstyle{s}}\cdot\delta_{u(s)}(\mathfrak{h}(ab))\,.

\varphi_{d}(p):=\left\{\begin{array}[]{rl}0&\text{if }p=T\,,\\ 1&\text{if }p\neq T\,,\end{array}\right.\quad\text{and}\quad\varphi_{s}(p):=\mathtt{dist}\!\left(p,T\right)\,,

\varphi_{d}(p):=\left\{\begin{array}[]{rl}0&\text{if }p=T\,,\\ 1&\text{if }p\neq T\,,\end{array}\right.\quad\text{and}\quad\varphi_{s}(p):=\mathtt{dist}\!\left(p,T\right)\,,

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLogic, Reasoning, and Knowledge · Bayesian Modeling and Causal Inference · AI-based Problem Solving and Planning

Full text

Iterated Belief Revision Under Resource Constraints: Logic as Geometry

Dan P. Guralnik

Electrical & Systems Engineering, School of Engineering & Applied Sciences, University of Pennsylvania, Penn Engineering Research & Collaboration Hub (PERCH), 3401 Grays Ferry Ave., Pennovation Center, Building 6176, 3rd Floor, Philadelphia, PA 19146

[email protected]

and

Daniel E. Koditschek

Electrical & Systems Engineering, School of Engineering & Applied Sciences, University of Pennsylvania, Penn Engineering Research & Collaboration Hub (PERCH), 3401 Grays Ferry Ave., Pennovation Center, Building 6176, 3rd Floor, Philadelphia, PA 19146

[email protected]

Abstract.

We propose a variant of iterated belief revision designed for settings with limited computational resources, such as mobile autonomous robots. The proposed memory architecture—called the universal memory architecture (UMA)—maintains an epistemic state in the form of a system of default rules similar to those studied by Pearl and by Goldszmidt and Pearl (systems $Z$ and $Z^{+}$ ).

A duality between the category of UMA representations and the category of the corresponding model spaces, extending the Sageev-Roller duality between discrete poc sets and discrete median algebras provides a two-way dictionary from inference to geometry, leading to immense savings in computation, at a cost in the quality of representation that can be quantified in terms of topological invariants. Moreover, the same framework naturally enables comparisons between different model spaces, making it possible to analyze the deficiencies of one model space in comparison to others.

This paper develops the formalism underlying UMA, analyzes the complexity of maintenance and inference operations in UMA, and presents some learning guarantees for different UMA-based learners. Finally, we present simulation results to illustrate the viability of the approach, and close with a discussion of the strengths, weaknesses, and potential development of UMA-based learners.

1. Introduction

1.1. Motivation.

Iterated belief revision (BR) deals with the problem of maintaining syntactic propositional knowledge representations that are sufficiently flexible to accommodate reasoning about a stream of incoming observations in the form of propositional formulae (over a finite alphabet of atomic propositions), while taking into account the possibility of any such observation being inconsistent with the current state of the knowledge representation. It is not unreasonable then to argue that BR operators should be used for maintaining well-reasoned internal representations for autonomous learning agents (see, e.g. [47]). However, one needs merely to observe the high computational costs associated with revision operators [31, 30] to conclude that such representations are too expensive to implement them in a mobile autonomous agent. Attempts at making the representations more palatable using prime forms [6, 33] have been made, but the fundamental complexity barriers remain [26].

We introduce a computationally cheap form of iterated propositional belief revision—the universal memory architecture (UMA)—which harnesses the geometry of model spaces in place of the model-theoretic techniques characteristic of this field. The computational advantages come at the price of modifying the notion of an observation and restricting the syntactic form of the epistemic state maintained by the agent (understood in the broad sense of Darwiche and Pearl [11]) to a special type of default system in the sense of [39]. Most notably, observations are no longer allowed to take the form of arbitrary propositional formulae; rather, we restrict them to conjunctive monomials in the underlying propositional variables. Equivalently, an observation is a partial truth-value assignment to the agent’s inputs. In addition, each observation is accompanied by a value signal—a quantity indicating a notion of the value of the experience to the agent at that time. 111The value signal should not be confused with the notion of reward, as used in Reinforcement Learning. One of our learning schemes (see Section 4.2) leads to a (partial) syntactic representation of the distribution from which observations are being drawn, and does not encode any preference of one state over another.

These alterations to the classical setting of iterated BR are motivated by the prospect of implementing iterated BR on mobile robotic platforms in real time. While the Boolean component of the observation corresponds to the robot’s raw sensory inputs, the value signal may correspond to an encoding of a task, or to feedback from a teacher. The limited form of the epistemic state maintained by an UMA instance reduces the space and time complexity costs of maintenance (applying the revision operator) and exploitation (e.g. inference) down to an absolute minimum, as we review next.

1.2. Contributions: Introduction and Analysis of UMAs.

Motivated by the problem of realizing iterated belief revision and update in a bounded resources setting, we seek a class of lightweight general-purpose representations. From a learning perspective, ours is a problem of learning from positive examples: an observer of an unknown, unmodeled system $\mathscr{S}$ experiences some process—a sequence of transitions—in that system through an array of Boolean sensors, and is required to reason about regularities in the observed sequence of experiences, constructing a formal theory of what is possible for that system.

We assume that observations occur in discrete time steps. An observation at time $t$ will consist of (1) a complete truth-value assignment222Henceforth, the symbol $(\big{|}_{\scriptscriptstyle{t}})$ appended to anything else should be read as “at time $t$ ”. $\mathtt{Obs}\big{|}_{\scriptscriptstyle{t}}$ —the observation at time $t$ —to a fixed set $\mathbf{\Sigma}$ —the sensorium—of Boolean queries of the agent’s interactions with its environment; and of (2) a sample $\varphi\big{|}_{\scriptscriptstyle{t}}$ of a fixed value signal, $\varphi$ .

Little needs to be assumed about the sensorium: for the purpose of this paper, we allow any query expressible as a Boolean function of the state history (finite or infinite) of the system $\mathscr{S}$ (an appropriate formalism is developed in Section 2.1.1); it is also assumed that truth-value assignments $\mathtt{Obs}\big{|}_{\scriptscriptstyle{t}}$ are consistent in the sense that each agrees with the values of the available queries on the history that manifested at the corresponding time; finally, observations are assumed to be time shift-invariant in the sense that observing the same histories at different times must yield the same Boolean observation vector. The value signal, for now, is assumed to be static, in the sense that it factors through a function of the observation (more detail in Section 3.1). The architecture itself does not rely on any of these assumptions, but the learning guarantees we provide in this paper do.

An UMA representation integrates its accumulated experiences by repeatedly revising two structural components, based on the incoming observations: (a) a relation $G\big{|}_{\scriptscriptstyle{t}}$ , called a pointed complemented relation (PCR), representing a system of implications, or defaults, which the agent believes to hold true among the queries in $\mathbf{\Sigma}$ ; and (b) a set $\mathtt{Curr}\big{|}_{\scriptscriptstyle{t}}\subset\mathbf{\Sigma}$ , representing the agent’s belief regarding the current state of the system. The machinery for maintaining these data structures will be referred to as a snapshot. Briefly, our results about UMA representations are as follows.

Universality of Representation.

In our intended setting, the learner’s sensors realize the formal sensorium $\mathbf{\Sigma}$ as a family of subsets of the space of histories, closed under complementation. The possible worlds actually witnessed by points of this space correspond to the learner’s perceptual equivalence classes (in the sense of, e.g. [13, 42]). Intuitively, an element $(a,b)$ of the PCR $G\big{|}_{\scriptscriptstyle{t}}$ should be seen as correct if no history falsifies the formula $a\rightarrow b$ , and, more generally, if histories falsifying $a\rightarrow b$ are improbable, or insignificant according to the user’s formal model of these notions.

It turns out that a PCR $G$ supports a natural dual space, a set $\mathbf{M}=G^{\circ}$ of possible worlds canonically associated with the PCR. Recall that a possible world over $\mathbf{\Sigma}$ is a complete truth value assignment $\mathbf{\Sigma}\to\{\bot,\top\}$ . We prove that, given a PCR $G$ over a set of literals $\mathbf{\Sigma}$ , its dual space $\mathbf{M}$ has the following universality property (Proposition 2.22): $\mathbf{M}$ is the smallest set of possible worlds over $\mathbf{\Sigma}$ which, for any realization $\rho$ of $\mathbf{\Sigma}$ as a set of Boolean queries over a space $\mathbf{X}$ not falsifying a relation listed in $G$ , contains every model for $\rho$ .

Returning to UMA learners, this means that the model space $\mathbf{M}\big{|}_{\scriptscriptstyle{t}}$ encoded by the PCR $G\big{|}_{\scriptscriptstyle{t}}$ is a minimal envelope for the true space of possible worlds, provided just the information that all the relations recorded in $G\big{|}_{\scriptscriptstyle{t}}$ are correct.

Computational Complexity.

From a computational perspective, the maintenance costs of an UMA representation are roughly the same as those of maintaining a neural representation (=the cost of maintaining and using a matrix of weights), but with the added benefit of affording a formal understanding of the model space, its geometry, and its deficiencies. Here are some results, all of which are corollaries of the geometric properties of the class of model spaces defined by PCRs. Let $N$ denote the cardinality of the sensorium $\mathbf{\Sigma}$ . Then:

•

Maintaining an UMA snapshot structure requires $O(N^{2})$ space;

•

Update operations for learning the PCR structure require $O(N^{2})$ time;

•

Inference requires $O(N^{2})$ time, reducible to $O(N)$ on fully parallel hardware. 333We will remark that our current implementation is, in fact, an $O(N^{3})$ implementation utilizing matrix multiplication on a GPU. This kind of implementation makes it possible to multiply fairly big matrices very quickly, improving on the performance of the naïve quadratic algorithm we provide later in this paper.

Multiple Learning Paradigms.

The mathematical foundations for UMA provide sufficient flexibility to admit a variety of learning mechanisms and settings, spanning the range from probabilistic filtering, as proposed in [19], to a variation on [iterated] revision and update introduced in [11], while keeping maintenance costs down to the bare minimum (see preceding paragraph). Depending on the snapshot type, different learning scenarios and guarantees may be provided, while maintaining a uniform revision and update scheme at the symbolic level.

Flexibility of Representation.

A central feature of the UMA architecture is that the duality theory of PCRs allows one to interpret maps between PCRs as maps between the associated model spaces and vice versa. This makes it possible to formally introduce—as well as operate with—notions of approximate equivalence, of redundancy and negligibility of queries. This also enables the study of the impact on model space geometry of operations augmenting a sensorium with new queries (see, for example, Section A.2.4) or removing existing ones. In particular, this opens a way to formal (and, possibly, automated) cost/benfit analysis of such extension and pruning operations—a topic of ongoing research at the moment, which we will touch upon briefly in our final discussion of the results presented in this paper.

1.3. Related Work.

Given the focus of this work on the representation of knowledge using defaults, we believe it is most tightly related to work in the field of propositional iterated belief revision. Early work in BR resulted in wide acceptance of the AGM framework [4, 3, 2] for maintaining a belief set—a deductively closed set of formulae representing the state of the observed system. Convenient, intuitive axioms for belief revision in the propositional setting, the KM axioms, were developed by Katsuno and Mendelzon in [25].

Pointing out some inadequacies of the KM axioms in the context of repeated application of revisions, Darwiche and Pearl (DP) argue in their seminal paper [11] that, to achieve the overarching goal of iterated revision, one must maintain a set of conditional statements—an epistemic state—which, upon revision by an incoming observation, always produces a belief set accommodating that observation (axiom $\mathbf{R\!\ast\!1}$ of the DP system of axioms for iterated revision). Building on Spohn’s framework of ordinal conditional functions [46] and its implications for ranked default systems [39, 17] and revision of the associated belief sets [18], they propose to view ranking functions as epistemic states (interchangeable with the associated system of ranked defaults), as they construct appropriate revision operators. Consequent work by many authors [24, 12, 27, 22, 34, 28]—much of it very new—considers different weaknesses and benefits of the DP axioms, relating to the effect of the order in which observations are made and the manner of mutual dependence they present, and resulting in a variety of iterated revision methods, as well as in some proposals to apply belief revision methods to the control of general agents [47] based on varying computational approaches to belief revision operators (e.g. [6, 33] on the use of prime forms for this purpose).

Clearly, the problems tackled by this field generalize the representation problem we posed at the beginning of Section 1.2, but one needs merely to observe the high computational costs associated with revision operators [31, 30] (or with computing normal forms and prime forms [26]) to reach the conclusion that the existing computational approaches cannot be considered viable candidates for a solution of the representation problem in any setting where computational resources are limited.

Aiming to reduce the computational burden on the learner, we shift attention from precise syntactic computation with arbitrary propositional formulae to imposing radical simplifying assumptions on the allowed model spaces. The postulated mode of interaction between the agent and its environment—specifically the fact that the agent is constrained to processing sequences of samples from the space $\mathbf{M}$ of realizable models (rather than arbitrary propositional formulae)—suggests constructing successive upper approximations $\mathbf{M}\big{|}_{\scriptscriptstyle{t}}\supset\mathbf{M}$ of $\mathbf{M}$ , belonging to a restricted class $\mathscr{C}$ which satisfying the following intuitive properties:

(1)

Syntactic characterization of an element in $\mathscr{C}$ is computationally inexpensive; 2. (2)

Each approximation is, in some sense, optimal/minimal among members of $\mathscr{C}$ , given its predecessor and the last observation; 3. (3)

Reasoning (e.g., forming a belief set) over a member of $\mathscr{C}$ is cheap.

We present results on what is, in essence, the simplest possible class $\mathscr{C}$ of model spaces satisfying these three requirements: the class of finite median algebras. This class of spaces is well studied, in several different guises, and in very disparate fields. These include: event structures in parallel computation [40]; median graphs in metric graph theory [8]; simply connected non-positively curved cubical complexes in formalizations of reconfiguration in robotic systems [16]; and the spectacular recent achievements in the topology of 3-dimensional manifolds by Agol [1] are much due to the notion of a cubulated group from Geometric Group Theory [50].

1.4. Structure of this Paper.

In Section 2, we extend Sageev-Roller duality444See [43] for a detailed development of that theory; chapters 6-7 of [50] for a brief intuitive review; and here, Appendix A for background material and examples developed specifically to support this paper., to obtain all finite median algebras as duals (model spaces) of PCRs, viewed as systems of defaults. Further, we explain how to reason over model spaces in this class by leveraging their geometry to avoid satisfiability checks, or any kind of explicit search in model space, for that matter. We then explain in Section 3 how, using UMA snapshot structures to perform a variant of iterated revision, where the model-theoretic outlook on the problem is replaced by its geometric counterpart arising by Sageev-Roller duality. We discuss the necessity of relaxing the DP axiom $\mathbf{R\!\ast\!1}$ , and show there is a natural operator for computing a belief set, the coherent projection.

Section 4 presents two different classes of snapshot structures—mechanisms for learning PCR representations—one motivated by Goldszmidt and Pearl’s interpretation of default reasoning as qualitative probabilistic reasoning [18], and the other based on statistical integration of the observed value signal. Finally, Section 5 presents two kinds of simulation studies:

(1)

First, in a range of settings with a-priori known (or readily computable) implications in the sensorium, we consider the deviation of the learned PCR from the ground truth as a function of the number of samples. This is done for both snapshot types, and under different exploration paradigms: sampling and diffusion. 2. (2)

Next, we consider settings closer to the heart of a roboticist. We implement agents with a reactive control paradigm based entirely on their internal UMA representations and conduct comparative simulation studies of their performance given different domains for exploration, and snapshot types.

We close with a discussion of our results and of avenues for additional research in Section 6.

2. Model Spaces for Systems of Approximate Implications.

In this section we construct a representation for finite median algebras (see above) that is sufficiently flexible to be maintained dynamically, and we explain how to reason over these representations. We review and apply existing results about the geometry of model spaces of this class of representations, leading to complexity bounds on maintenance and exploitation.

Section 2.1 formally introduces the basic formal notions required for discussing our representations. Section 2.2 constructs the model spaces as dual spaces of pointed complemented relations (PCRs) and discusses their universal properties. Section 2.3 relates PCRs and their duals (the associated model spaces) to the earlier duality theory of poc sets that motivated our approach, showing that PCR duals are, in fact, poc set duals. Section 2.4 reviews known results about the geometry and topology of poc set duals. Finally, in Section 2.5 we discuss the connection between the geometry of PCR duals and algorithms enabling reasoning over PCRs.

2.1. Pointed Complemented Relations (PCR).

The nature of our application requires a generalization of the formal theory we are about to use, the Sageev-Roller duality theory of poc sets [43], prompting some changes in the language. We start with:

Definition 2.1 (pointed complemented set, PCS).

A pointed complemented set is a set $\mathbf{\Sigma}$ endowed with a self-map $a\mapsto a{{}^{\scriptscriptstyle\ast}}$ satisfying $a{{}^{\scriptscriptstyle\ast}}{{}^{\scriptscriptstyle\ast}}=a$ and $a{{}^{\scriptscriptstyle\ast}}\neq a$ for all $a\in\mathbf{\Sigma}$ , and containing a distinguished element, denoted $\mathbf{0}$ . The element $\mathbf{0}{{}^{\scriptscriptstyle\ast}}$ will be denoted $\mathbf{1}$ . Whenever possible and safe, we will abuse notation and use the symbols $\mathbf{0},\mathbf{1},\ast$ in different PCSs. For any $S\subset\mathbf{\Sigma}$ we will denote by $S{{}^{\scriptscriptstyle\ast}}$ the set of all $x{{}^{\scriptscriptstyle\ast}}$ , $x\in S$ .∎

Definition 2.2 (PCS morphism).

By a PCS morphism we mean a function $f:\mathbf{\Sigma}_{1}\to\mathbf{\Sigma}_{2}$ between PCSs satisfying $f(\mathbf{0})=\mathbf{0}$ and $f(a{{}^{\scriptscriptstyle\ast}})=f(a){{}^{\scriptscriptstyle\ast}}$ for all $a\in\mathbf{\Sigma}_{1}$ . The set of all PCS morphisms from $\mathbf{\Sigma}_{1}$ to $\mathbf{\Sigma}_{2}$ will be denoted by $\mathrm{Hom}_{\scriptscriptstyle{PCS}}\!\left(\mathbf{\Sigma}_{1},\,\mathbf{\Sigma}_{2}\right)$ .∎

Example 2.3 (set families, power sets).

Any collection $\mathscr{U}\subseteq\mathbf{2}^{X}$ of subsets of a fixed non-empty set $\mathbf{X}$ satisfying (1) $\varnothing\in\mathscr{U}$ , and (2) $A\in\mathscr{U}\Rightarrow\mathbf{X}\smallsetminus A\in\mathscr{U}$ . Then $\mathscr{U}$ is a PCS with respect to the choices $\mathbf{0}:=\varnothing$ and $A{{}^{\scriptscriptstyle\ast}}:=\mathbf{X}\smallsetminus A$ .

The power set of a singleton is, up to isomorphism, the smallest PCS, which we denote by $\mathbf{2}$ , and identify with the set $\{\bot,\top\}$ . Also, the power set $\mathbf{2}^{X}$ will be routinely identified with the set of all functions $X\to\{\bot,\top\}$ .

Example 2.4 (PCS over an alphabet).

Suppose $\mathbb{A}$ is a finite collection of symbols, and think of them as atoms of the propositional calculus over $\mathbb{A}$ . The extended collection of literals over $\mathbb{A}$ ,

[TABLE]

may be thought of as a PCS when one declares $\mathbf{0}:=\bot$ , $\bot{{}^{\scriptscriptstyle\ast}}:=\top$ , $\top{{}^{\scriptscriptstyle\ast}}:=\bot$ and $a{{}^{\scriptscriptstyle\ast}}:=\neg a$ , $(\neg a){{}^{\scriptscriptstyle\ast}}:=a$ for all $a\in\mathbb{A}$ . Hereafter, $\top$ and $\bot$ stand for the truth values True and False, respectively.

The reason for considering PCSs is that $\ast$ -selections “live on them”:

Definition 2.5 ( $\ast$ -selection, the Hamming cube).

Let $\mathbf{\Sigma}$ be a PCS. By a $\ast$ -selection on $\mathbf{\Sigma}$ we mean a subset $S\subset\mathbf{\Sigma}$ such that $S\cap S{{}^{\scriptscriptstyle\ast}}=\varnothing$ . In addition, a $\ast$ -selection $S$ on $\mathbf{\Sigma}$ is complete, if $S\cup S{{}^{\scriptscriptstyle\ast}}=\mathbf{\Sigma}$ . The set of all $\ast$ -selections $S\subset\mathbf{\Sigma}$ with $\mathbf{1}\in S$ will be denoted by $\mathbf{S}(\mathbf{\Sigma})$ , and referred to as the [combinatorial] Hamming cube on $\mathbf{\Sigma}$ . Its set of vertices, the complete $\ast$ -selections in $\mathbf{S}(\mathbf{\Sigma})$ , will be denoted by $\mathbb{H}(\mathbf{\Sigma})$ . ∎

We now consider these notions in the context of our intended application.

2.1.1. Binary Sensing, Possible Worlds and Perceptual Classes.

Suppose $\alpha$ is an observer of some system $\mathscr{S}$ as it undergoes the transitions along a state trajectory $(p_{t})_{t=-\infty}^{\infty}$ , and suppose $\mathbb{A}$ is a finite set of unique labels for the Boolean queries available to $\alpha$ —this observer’s sensorium. We assume observations of $\mathscr{S}$ by $\alpha$ begin at $t=0$ . It will not matter for our discussion whether the trajectory of $\mathscr{S}$ in any particular instance does indeed extend indefinitely into the past or future: if needed, one may set the value of $p_{t}$ to be eventually constant (in either direction).

By a history of $\mathscr{S}$ we mean a sequence of the form $\mathbf{x}:=(x_{s})_{s=-\infty}^{0}$ , where $x_{s}$ is a state of $\mathscr{S}$ for all $s$ , and $x_{0}$ represents the current state of the history $\mathbf{x}$ ; $x_{-1}$ represents the preceding state, and so on. Given a trajectory $(p_{t})$ of $\mathscr{S}$ observed by $\alpha$ , at each time $t\geq 0$ , the history that manifests at time $t$ is given by $x_{s}:=p_{t+s}$ .

Henceforth, we let $\mathbf{X}$ denote the space of histories possible for the system $\mathscr{S}$ given the initial history manifested at time $t=0$ (as is the case in all physical systems, $\mathscr{S}$ may have its own dynamics, disqualifying some histories from manifesting at any time $t>0$ , or making such events highly improbable). To say that $\alpha$ ’s queries/sensors are time-shift invariant is to say that each query is represented by a fixed Boolean function of the manifested history. In other words, the sensorium is defined by a PCS morphism $\rho:\mathbf{\Sigma}\to\mathbf{2}^{\mathbf{X}}$ , $\mathbf{\Sigma}:=\mathbf{\Sigma}(\mathbb{A})$ , with a sensor $s\in\mathbf{\Sigma}(\mathbb{A})$ reporting $\top$ on history $x\in\mathbf{X}$ if and only if $x\in\rho(s)$ .

The mapping $\rho$ induces a partition on $X$ —its partition into perceptual classes—as follows. Construct a map $\rho{{}^{\scriptscriptstyle\ast}}\colon\mathbf{X}\to\mathbb{H}(\mathbf{\Sigma})$ by setting $s\in\rho{{}^{\scriptscriptstyle\ast}}(x)$ if and only if $x\in\rho(s)$ ; each point is mapped to the set of queries (including complements) which evaluate to $\top$ on that point. Two points $x,y\in\mathbf{X}$ are sensory-equivalent if $\rho{{}^{\scriptscriptstyle\ast}}(x)=\rho{{}^{\scriptscriptstyle\ast}}(y)$ . The image $\mathbf{M}(\rho):=\mathrm{Im}(\rho{{}^{\scriptscriptstyle\ast}})$ are the possible perceptual states of $\alpha$ in the system $\mathscr{S}$ , given $\rho$ and the system’s initial history. We will also refer to a world/ $\ast$ -selection $u\in\mathbb{H}(\mathbf{\Sigma})$ as consistent, if, and only if $u\in\mathbf{M}(\rho)$ , or, in other words, if and only if $u$ is witnessed (through $\rho$ ) by a point of $\mathbf{X}$ .

2.1.2. Concept Presentation of Perceptual States.

Digging deeper into the formalism presented just now, observe that $\ast$ -selections $S\subset\mathbf{\Sigma}(\mathbb{A})$ are in one-to-one correspondence with vectors, as defined in concept learning [48]. Recall that a vector is an assignment $v:\mathbb{A}\to\{\mathbf{0},\mathbf{1},{\scriptscriptstyle\bullet}\}$ of values standing for $\bot$ , $\top$ , and “undetermined”, respectively, to the alphabet $\mathbb{A}$ . A vector is total if it has no $({\scriptscriptstyle\bullet})$ values. The map $v\mapsto\sigma_{v}:=v{{}^{\scriptscriptstyle-1}}(\mathbf{1})\cup(v{{}^{\scriptscriptstyle-1}}(\mathbf{0})){{}^{\scriptscriptstyle\ast}}$ is then a correspondence between vectors over $\mathbb{A}$ and $\ast$ -selections on the PCS $\mathbf{\Sigma}(\mathbb{A})$ , mapping the set of total vectors onto the set of complete $\ast$ -selections. In more geometric terms, a complete $\ast$ -selection—which corresponds to a complete conjunctive monomial (aka complete term) over $\mathbb{A}$ —defines a vertex of the cube $[0,1]^{\mathbb{A}}$ , while a $\ast$ -selection $S$ with $|S|=|\mathbb{A}|-d$ corresponds to a $d$ -dimensional face. We will refer to $[0,1]^{\mathbb{A}}$ as the Hamming cube. The advantage of PCS terminology here is that $\ast$ -selections on $\mathbf{\Sigma}$ enumerate the faces of the Hamming cube without us having to pick an origin for the cube.

Pushing the geometric viewpoint a bit further, we consider the notion of concepts. In [48], Valiant defines concepts as mappings $F$ of the space of vectors to $\{\bot,\top\}$ , satisfying the requirement that $F(v)=1$ on a vector $v$ if and only if $F(w)=1$ for all total vectors $w$ which agree with $v$ on those $a\in\mathbb{A}$ where $v(a)\neq{\scriptscriptstyle\bullet}$ . In other words, concepts correspond to collections $K$ of faces of the Hamming cube, possibly of varying dimensions, satisfying the condition that a face $F$ belongs to $K$ if and only if every vertex of $F$ lay in $K$ . Such $K$ are precisely the sub-complexes of the Hamming cube obtainable from it by vertex deletions.555Similarly to case of graphs, the operation of deleting a vertex from a cubical complex requires the removal of all the adjoining faces.

Now we return to the observer $\alpha$ and the system $\mathscr{S}$ whose evolution it observes through the queries realized by $\rho\colon\mathbf{\Sigma}(\mathbb{A})\to\mathbf{2}^{\mathbf{X}}$ , as discussed in the preceding section. Thinking of the space of perceptual classes $\mathbf{M}(\rho)$ as a concept gives rise to a cubical sub-complex, say $\mathtt{Cube}\!\,(\rho)$ , of the Hamming cube, whose faces correspond to those $\ast$ -selections on the PCS $\mathbf{\Sigma}=\mathbf{\Sigma}(\mathbb{A})$ that are witnessed (via $\rho$ ) by a point in $\mathbf{X}$ . Thus, precise reasoning and planning over $\mathbf{M}(\rho)$ depends on one’s ability to efficiently capture/encode: (1) the notion of consistency produced by the map $\rho$ ; (2) the topological properties (e.g. connectivity, contractibility) of $\mathtt{Cube}\!\,(\rho)$ ; and (3) the geometric properties (e.g. shortest paths, curvature, isoperimetric inequalities) of $\mathtt{Cube}\!\,(\rho)$ . The class of approximating model spaces we propose to use as proxies for $\mathbf{M}(\rho)$ is a result of weakening this notion of consistency to the extreme, all the way to the notion of coherence discussed in the next section.

2.1.3. PCRs, Implications and Coherence.

Definition 2.6 (pointed complemented relation, PCR).

Let $\mathbf{\Sigma}$ be a PCS. By a pointed complemented relation over $\mathbf{\Sigma}$ we mean a set $G\subseteq\mathbf{\Sigma}\times\mathbf{\Sigma}$ satisfying666To avoid a proliferation of parentheses, we write $ab$ to denote the pair $(a,b)\in\mathbf{\Sigma}\times\mathbf{\Sigma}$ . $\mathbf{0}a\in G$ and $ab\in G\Leftrightarrow b{{}^{\scriptscriptstyle\ast}}a{{}^{\scriptscriptstyle\ast}}\in G$ for all $a,b\in\mathbf{\Sigma}$ .∎

In the context of the representation problem, one should think of a PCR $G$ over $\mathbf{\Sigma}$ as a record of Boolean implications believed to be valid over $\mathbf{\Sigma}=\mathbf{\Sigma}(\mathbb{A})$ , conditioned on the particular space of histories being observed. In this respect, a PCR is a restricted form of the notion of a system of defaults, as discussed, e.g. in [18]. Some of these implications are specified directly ( $ab\in G$ to be read as “it is believed that $b$ follows from $a$ ”), while others are derived as their consequences, by transitive closure. Hence the following language:

Definition 2.7.

Given a PCR $G$ over a PCS $\mathbf{\Sigma}$ , for any $a,b\in\mathbf{\Sigma}$ , $S\subseteq\mathbf{\Sigma}$ , one defines the following:

•

Write $a\leq_{G}b$ if $ab$ lies in the reflexive and transitive closure of $G$ ;

•

The $G$ -equivalence class of $a\in\mathbf{\Sigma}$ , denoted $[a]_{G}$ , is the equivalence class of $a$ under the relation $a\sim b\Leftrightarrow a\leq_{G}b\wedge b\leq_{G}a$ on $\mathbf{\Sigma}$ ;

•

The forward (backward) closure, $S\!\uparrow$ (resp. $S\!\downarrow$ ), of $S$ with respect to $G$ is the set of all $b\in\mathbf{\Sigma}$ for which $a\leq_{G}b$ (respectively $b\leq_{G}a$ ) holds for some $a\in S$ ;

•

Note that $S\subset S\!\uparrow$ . One says that $S$ is forward-closed if $S\!\uparrow=S$ ;

•

Finally, we observe that $S{{}^{\scriptscriptstyle\ast}}\!\uparrow=S\!\downarrow{{}^{\scriptscriptstyle\ast}}$ for all $S\subseteq\mathbf{\Sigma}$ .

We will often drop the subscripts $G$ when no ambiguity can arise.∎

Definition 2.8 (PCR morphism).

Let $G_{1},G_{2}$ be PCRs over $\mathbf{\Sigma}_{1},\mathbf{\Sigma}_{2}$ , respectively. A morphism of PCRS from $G_{1}$ to $G_{2}$ is a PCS morphism $f:\mathbf{\Sigma}_{1}\to\mathbf{\Sigma}_{2}$ , additionally satisfying $f(a)\leq f(b)$ in $G_{2}$ whenever $ab\in G_{1}$ . The set of all morphisms from $G_{1}$ to $G_{2}$ will be denoted by $\mathrm{Hom}_{\scriptscriptstyle{PCR}}\!\left(G_{1},\,G_{2}\right)$ .∎

The primary example of a PCR for this work derives from the view of a power set as a PCS (Example 2.3):

Example 2.9 (Set Families as PCRs).

Let $\mathbf{X}\neq\varnothing$ be a set. Then any collection $\mathscr{U}$ of subsets of $\mathbf{X}$ that is closed under complementation and satisfies $\varnothing\in\mathscr{U}$ gives rise to the PCR of all pairs $(A,B)$ with $A\subseteq B$ , $\mathbf{0}=\varnothing$ and $A{{}^{\scriptscriptstyle\ast}}=\mathbf{X}\smallsetminus A$ . In what follows, $\mathbf{2}^{X}$ will always be regarded as a PCR in this way, for any $\mathbf{X}\neq\varnothing$ .∎

Another ‘canonical’ example of a PCR to keep in mind is:

Example 2.10 (Less classical PCRs).

Let $\mathbf{X}\neq\varnothing$ be any set. Then $[0,1]^{\mathbf{X}}$ may be endowed with the structure of a PCR by setting $\mathbf{0}(\mathbf{x}):=0$ , $\mathbf{x}\in\mathbf{X}$ , and, for any $\psi,\varphi\in[0,1]^{\mathbf{X}}$ , setting $\psi^{\ast}(\mathbf{x}):=1-\psi(\mathbf{x})$ , $\mathbf{x}\in\mathbf{X}$ and $(\varphi,\psi)\in G$ if and only if $\varphi(t)\leq\psi(\mathbf{x})$ , $\mathbf{x}\in\mathbf{X}$ .∎

Our notion of model for a PCR rests on the following weak form of consistency:

Definition 2.11.

Let $G$ be a PCR over $\mathbf{\Sigma}$ . A subset $S\subseteq\mathbf{\Sigma}$ is said to be $G$ -coherent, if no pair $a,b\in S$ satisfies $a\leq b{{}^{\scriptscriptstyle\ast}}$ .∎

Note that a $G$ -coherent set is always a $\ast$ -selection on $\mathbf{\Sigma}$ . Furthermore:

[TABLE]

so coherence is preserved by forward closure. Coherent, forward-closed sets may be thought of as the natural counterparts of the notion of a belief state in this setting. We now turn to studying the appropriate notion of model.

2.2. Model Spaces as Dual Spaces

Definition 2.12 (duals).

Let $G$ be a PCR over $\mathbf{\Sigma}$ . The set $G^{\circ}$ of maximal $G$ -coherent subsets of $\mathbf{\Sigma}$ is the dual of $G$ . The set of all forward-closed $G$ -coherent subsets will be denoted $\mathbf{C}(G)$ .∎

A standard application of Zorn’s lemma shows that any $G$ -coherent subset of $\mathbf{\Sigma}$ is contained in an element of $G^{\circ}$ . Note also that $G^{\circ}\subseteq\mathbf{C}(G)$ .

Example 2.13 (the orthogonal PCR and the Hamming cube).

The simplest example of a dual space is one where the PCR in question is as small as possible. Let $\mathbf{\Sigma}$ be a PCS. The smallest PCR over $\mathbf{\Sigma}$ contains only pairs of the forms $\mathbf{0}a$ and $a\mathbf{1}$ . We will denote this PCR by $\mathbb{O}(\mathbf{\Sigma})$ and refer to it as the orthogonal PCR over $\mathbf{\Sigma}$ . It is clear that $\mathbb{O}(\mathbf{\Sigma})^{\circ}=\mathbb{H}(\mathbf{\Sigma})$ , the “Hamming cube” from Definition 2.5.∎

Example 2.14 (‘bad’ queries).

The definitions given above do not preclude one from considering, for example, the PCR $G_{1}=\{\mathbf{0}\mathbf{1},\mathbf{1}\mathbf{0}\}$ . It is easy to see that $G_{1}^{\circ}=\{\varnothing\}$ . At the same time, the smaller $G_{2}=\{\mathbf{0}\mathbf{1}\}$ has $G_{2}^{\circ}=\{\mathbf{1}\}$ . More generally, for any $a\in\mathbf{\Sigma}$ , having $a\leq a{{}^{\scriptscriptstyle\ast}}$ precludes $a$ from belonging in any $G$ -coherent set. In particular, if both $a\leq a{{}^{\scriptscriptstyle\ast}}$ and $a{{}^{\scriptscriptstyle\ast}}\leq a$ hold, then no $G$ -coherent set is a complete selection on $\mathbf{\Sigma}$ .∎

Following the last example, two definitions are in order:

Definition 2.15.

The trivial PCR, henceforth also denoted by $\mathbf{2}$ , is the PCR over $\mathbf{\Sigma}=\{\mathbf{0},\mathbf{1}\}$ containing only $\mathbf{0}\mathbf{1}$ .∎

Definition 2.16 (negligible query, degenerate graph).

Let $G$ be a PCR over $\mathbf{\Sigma}$ . An element $a\in\mathbf{\Sigma}$ is $G$ -negligible, if $a\leq a{{}^{\scriptscriptstyle\ast}}$ . Denote the set of negligible elements by $N(G)$ . We say that $G$ is degenerate if $\mathbf{\Sigma}$ contains a negligible element whose complement is also negligible. Note that $N(G)\!\downarrow=N(G)$ .∎

Proposition 2.17.

For a PCR $G$ over $\mathbf{\Sigma}$ , the following are equivalent:

(1)

$G$ * is non-degenerate;* 2. (2)

Every element of $G^{\circ}$ is a complete selection on $\mathbf{\Sigma}$ ; 3. (3)

Some element of $G^{\circ}$ is a complete selection on $\mathbf{\Sigma}$ .

Proof.

See Section B.1.∎∎

The impact of this result on our representation problem is twofold. First, it provides a clear and easily verifiable criterion for when the dual space of a PCR consists (only!) of possible worlds. Second, it introduces a new and consistent notion of a query of low import, not involving arbitrary choices such as thresholding.

Proposition 2.18.

Let $G$ be a non-degenerate PCR over the PCS $\mathbf{\Sigma}$ . Then the mapping $\chi\colon\mathrm{Hom}_{\scriptscriptstyle{PCR}}\!\left(G,\,\mathbf{2}\right)\to G^{\circ}$ defined by $\chi(f)=f{{}^{\scriptscriptstyle-1}}(\mathbf{1})$ is a bijection.

Proof.

See Section B.2.∎∎

Remark 2.19.

Note that the mapping $\chi$ is independent of the choice of $G$ .

The last proposition explains the sense in which $G^{\circ}$ may be thought of as a dual space of $G$ . As with other instances of duality, this is useful because it enables dual mappings:

Definition 2.20.

Let $f:G_{1}\to G_{2}$ be a PCR morphism. The dual mapping $f^{\circ}:G_{2}^{\circ}\to G_{1}^{\circ}$ is defined by $f^{\circ}(S)=f{{}^{\scriptscriptstyle-1}}(S)$ . Alternatively, upon applying the identification in Proposition 2.18, for any $\varphi\in\mathrm{Hom}_{\scriptscriptstyle{PCR}}\!\left(G_{2},\,\mathbf{2}\right)$ , one has $f^{\circ}(\varphi)=\varphi\circ f$ to obtain an element of $\mathrm{Hom}_{\scriptscriptstyle{PCR}}\!\left(G_{1},\,\mathbf{2}\right)$ . ∎

We remark that, since morphisms are composable (meaning that the composition $(f\circ g)(a):=f(g(a))$ of two morphisms is a morphism as well), so are their dual mappings, producing the identity $(f\circ g)^{\circ}=g^{\circ}\circ f^{\circ}$ .

Example 2.21.

Let $G$ be a non-degenerate PCR over a PCS $\mathbf{\Sigma}$ . Then it is clear that the identity mapping $\iota:\mathbb{O}(\mathbf{\Sigma})\to G$ — that is: $\iota(a)=a$ for all $a\in\mathbf{\Sigma}$ — is a morphism of PCRs. The dual mapping $\iota\circ:G^{\circ}\to\mathbb{H}(\mathbf{\Sigma})$ is then, clearly, an injection. This reflects the intuitive notion that the dual of any (non-degenerate) PCR may be “excavated” out of a standard Hamming cube by going over all $G$ -incoherent pairs, one by one, and successively deleting any vertices of $\mathbb{H}(\mathbf{\Sigma})$ which contain the given pair.

We further specialize the example to our representation problem, considering the effect of fixing a PCR structure on a given PCS:

Proposition 2.22 (Universality of Representation).

Let $G$ be a non-degenerate PCR over $\mathbf{\Sigma}$ . Then, for any non-empty set $\mathbf{X}$ and every PCS morphism $\rho\colon\mathbf{\Sigma}\to\mathbf{2}^{\mathbf{X}}$ , the set $\mathbf{M}(\rho)$ of all complete $\ast$ -selections witnessed (via $\rho$ ) by a point in $\mathbf{X}$ (in the sense of Section 2.1.1) is contained in $G^{\circ}$ whenever $\rho$ is a PCR morphism. Moreover, $G^{\circ}$ is the smallest subset of $\mathbb{H}(\mathbf{\Sigma})$ having this property.

Proof.

See Section B.3.∎∎

Thus, the dual $G^{\circ}$ of a non-degenerate $G$ serves as a minimal model of the state space of the system $(agent+environment)$ , and remains valid under any change to this system for as long as $\rho$ remains order-preserving. This is a form of robustness of the representation to changes in the coupling between the agent’s sensory equipment and the environment: changes leaving the implication record invariant provide no reason for the agent to alter its reasoning.

2.3. Reducing PCR Representations.

The universality of PCR duals motivates a deeper study of their properties, seeking a better understanding of the degree of redundancy in the description of $G^{\circ}$ by a PCR $G$ . This is not a mere technical issue: while non-degeneracy guarantees the adequacy of our notion of an associated “possible world”, it is not obvious that it also provides for sufficient control over the quality of inference. The intended application—inferring approximate implications from partial observations—is well known to be problematic in the absence of simplifying assumptions (e.g. the ubiquitous restriction to directed acyclic graphs in the context of Bayesian networks). It is therefore crucial to clarify the precise formal sense in which a PCR may be viewed as encoding a “record of implications”, which is the purpose of this section. A crucial notion in any such discussion is that of what it means for a query, as well as for the difference of two queries, to be negligible, because negligible but non-zero differences tend to accumulate in the transitive closure into material ones.

Looking more closely at the setting of the last proposition, notice that, for a fixed $\rho$ , the assumption that $\rho$ is a morphism translates into the following. The property $a\leq b\Rightarrow\rho(a)\subseteq\rho(b)$ for all $a,b\in\mathbf{\Sigma}$ implies $\rho(a)=\varnothing$ for any $a\in N(G)$ (because $\varnothing$ is the only negligible element of $\mathbf{2}^{\mathbf{X}}$ ); furthermore, $\rho(a)=\rho(b)$ must hold whenever $a$ and $b$ are $G$ -equivalent (recall Definition 2.7). These identifications lead us to recall Roller’s definition of a poc set from [43]:

Definition 2.23 (poc set).

A poc set is a tuple $\mathbf{P}=(\mathbf{\Sigma},\leq,\mathbf{0},\ast)$ where $(\mathbf{\Sigma},\leq)$ is a partially ordered set with a minimum element $\mathbf{0}$ , endowed with an order-reversing involution777That is, $a{{}^{\scriptscriptstyle\ast}}{{}^{\scriptscriptstyle\ast}}=a$ and $a\leq b\Rightarrow b{{}^{\scriptscriptstyle\ast}}\leq a{{}^{\scriptscriptstyle\ast}}$ for all $a,b\in\mathbf{\Sigma}$ . $a\mapsto a{{}^{\scriptscriptstyle\ast}}$ satisfying $\mathbf{0}{{}^{\scriptscriptstyle\ast}}\neq\mathbf{0}$ and $a\leq a{{}^{\scriptscriptstyle\ast}}\Rightarrow a=\mathbf{0}$ for all $a\in\mathbf{\Sigma}$ .∎

In other words, a poc set is a transitive and anti-symmetric PCR over $\mathbf{\Sigma}$ whose only negligible element is $\mathbf{0}$ .

Proposition 2.24 (canonical quotient).

For any non-degenerate PCR $G$ there exists a surjective PCR morphism $\pi_{G}:G\to\widehat{G}$ of $G$ onto a poc set $\widehat{G}$ such that any PCR morphism $f:G\to\mathbf{P}$ gives rise to one and only one PCR morphism $\widehat{f}:\widehat{G}\to\mathbf{P}$ satisfying $f=\widehat{f}\circ\pi$ .

Proof.

We defer the proof to Section B.4, but define the canonical quotient mapping here. We set:

[TABLE]

and let $\widehat{\mathbf{\Sigma}}:=\left\{\pi_{G}(a)\left|a\in\mathbf{\Sigma}\right.\right\}$ , and setting $\pi_{G}(a)\leq\pi_{G}(b)$ to hold in $\widehat{G}$ if and only if $ab\in G$ . It remains to verify that (1) $\widehat{\mathbf{\Sigma}}$ is a well-defined PCS; (2) $\widehat{G}$ is a well-defined poc set structure over $\widehat{\mathbf{\Sigma}}$ ; and (3) the assertions of the proposition hold.∎∎

One should view this result as stating the precise conditions necessary for presenting a poc set in terms of a set of generators and a set of relations. However, the emphasis on what happens to morphisms leads to powerful realizations about dual spaces:

Corollary 2.25 (all duals are poc set duals).

If $G$ is a non-degenerate PCR then $\pi_{G}^{\circ}:\widehat{G}^{\circ}\to G^{\circ}$ is a bijection.

Proof.

See Section B.5.∎∎

Corollary 2.26 (naturality of canonical quotients).

Let $G,H$ be non-degenerate PCRs. Then, for every morphism $f:G\to H$ there exists one and only one morphism $\hat{f}:\widehat{G}\to\widehat{H}$ satisfying $\pi_{H}\circ f=\widehat{f}\circ\pi_{G}$ .

Proof.

See Section B.6.∎∎

A particular consequence of the last corollary is that one also has $\pi_{H}^{\circ}\circ\widehat{f}^{\,\circ}=f^{\circ}\circ\pi_{G}^{\circ}$ . This means the dual maps of $f$ and $\hat{f}$ coincide up to the identifications between the pre- and post-projection duals. Thus, any results about poc set duals apply to duals of PCRs. In the next two sections we review these results, and then harness them in our construction of the universal memory architecture (UMA).

2.4. Convexity theory of PCR duals.

To discuss the geometry of PCR duals, we need to endow PCRs with more structure. From this point on, all PCRS we consider will be finite, with the sole possible exception of power sets.

Definition 2.27 (Hamming metric).

Let $G$ be a PCR over $\mathbf{\Sigma}$ . The Hamming metric on $G^{\circ}$ is defined by $\mathbf{\Delta}\!\left(u,w\right)=\left|\pi_{G}(u)\smallsetminus\pi_{G}(w)\right|$ , where $\pi_{G}:G\to\widehat{G}$ is the canonical quotient map. We define $\mathtt{Dual}\!\left(G\right)$ to be the simple888That is: loopless, unoriented, with no multiple edges. graph with vertex set $G^{\circ}$ , and edges of the form $\{u,w\}$ for all $u,w\in G^{\circ}$ with $\mathbf{\Delta}\!\left(u,w\right)=1$ .∎

In the case when $G$ is already a poc set, two vertices $u,w\in\mathbf{P}^{\circ}$ form an edge if and only if $u\smallsetminus w$ is a singleton, that is: the perceptual classes represented by $u$ and $w$ differ by the truth value of a single query. The common edge they span in the Hamming cube $\mathbf{S}(\mathbf{\Sigma})$ corresponds to the $\ast$ -selection $u\cap w$ in the concept presentation. In the general case ( $G$ not necessarily a poc set), since both $u$ and $w$ are coherent, each is the union of $N(G){{}^{\scriptscriptstyle\ast}}$ with a number of $G$ -equivalence classes $[a]$ , $a\notin N(G)$ (recall Definition 2.7 and Proposition 2.24). Thus $u$ and $w$ span an edge in $G^{\circ}$ if and only if $u\smallsetminus v=[a]$ for some $a\notin N(G)\cup N(G){{}^{\scriptscriptstyle\ast}}$ . Intuitively, we think of the different $b\in[a]$ as counting for a single Boolean query.

We briefly recall the graph-theoretic notion of convexity:

Definition 2.28 (convexity in graphs).

Let $\Gamma=(V,E)$ be a graph and let $u,v\in V$ . The hop distance $d_{\Gamma}(u,v)$ is defined to be the minimum length of an edge-path in $\Gamma$ joining $u$ with $v$ . The interval $I(u,v)$ is defined to be the set of all vertices $w\in V$ satisfying the equality $d_{\Gamma}(u,v)=d_{\Gamma}(u,w)+d_{\Gamma}(w,v)$ . A set $C\subseteq V$ is said to be convex in $\Gamma$ , if $I(u,v)\subseteq C$ holds for all $u,v\in C$ . A set $H\subseteq V$ is a half-space of $\Gamma$ , if both $H$ and $V\smallsetminus H$ are convex sets in $G$ . Finally, we denote by $\mathcal{H}(\Gamma)$ the poc set whose elements are the half-spaces of $\Gamma$ (note that $\varnothing$ is a half-space of $\Gamma$ ), ordered by inclusion, and with $H{{}^{\scriptscriptstyle\ast}}:=V\smallsetminus H$ .∎

We refer the reader to [38], section 4, for the (very elegant and much more general) proofs of the following two lemmas (stated there for poc sets, but valid for finite non-degenerate PCRs as well, due to Proposition 2.24 and its two corollaries):

Lemma 2.29.

Let $G$ be a finite non-degenerate PCR. Then the hop metric on $\mathtt{Dual}\!\left(G\right)$ coincides with the metric $\mathbf{\Delta}$ .∎

Lemma 2.30.

Let $G$ be a finite non-degenerate PCR. Then the half-spaces of $\mathtt{Dual}\!\left(G\right)$ are precisely the subsets of $G^{\circ}$ of the form999Note that $\mathfrak{h}(a^{\ast};G)=G^{\circ}\smallsetminus\mathfrak{h}(a)$ for all $a\in G$ , by Proposition 2.17.

[TABLE]

In particular, subsets of $G^{\circ}$ of the form

[TABLE]

are convex in $\mathtt{Dual}\!\left(G\right)$ , for any $S\subseteq G$ .∎

Definition 2.31.

To simplify notation, we will abuse it in the following ways:

•

Writing $\mathfrak{h}(a)$ , $\mathfrak{h}(S)$ without specifying $G$ will henceforth refer to the subsets of $\mathbb{H}(\mathbf{\Sigma})$ , those are $\mathfrak{h}(a;\mathbb{O}(\mathbf{\Sigma}))$ and $\mathfrak{h}(S;\mathbb{O}(\mathbf{\Sigma}))$ , respectively.

•

When $S$ is explicitly known, $S=\{a_{1},\ldots,a_{k}\}$ , we will write $\mathfrak{h}(a_{1}\cdots a_{k};G)$ instead of $\mathfrak{h}(S;G)$ when convenient.

As a side note, observe that $\mathfrak{h}(S;G)=\mathfrak{h}(S)\cap G^{\circ}$ , where $\mathfrak{h}(S)$ coincides with the vertex set of a face of the hamming cube $\mathbb{H}(\mathbf{\Sigma})$ . In particular, presenting any subset of $G^{\circ}$ as a concept is equivalent to decomposing it as a union of convex subsets of $G^{\circ}$ .

Median Graphs.

The two preceding lemmas are results of $\mathtt{Dual}\!\left(G\right)$ being a median graph [8, 49]:

Definition 2.32.

A connected simple graph $\Gamma=(V,E)$ is said to be a median graph, if the set $I(u,v)\cap I(v,w)\cap I(u,w)$ contains exactly one vertex for each $u,v,w\in V$ . This vertex is the median of the triple $(u,v,w)$ and denoted by $\mathrm{med}\!\left(u,v,w\right)$ – see Figure 1. For median graphs $\Gamma_{i}=(V_{i},E_{i})$ , $i=1,2$ , a median morphism of $\Gamma_{1}$ to $\Gamma_{2}$ is a map $f\colon V_{1}\to V_{2}$ which preserves medians: $f(\mathrm{med}\!\left(u,v,w\right))=\mathrm{med}\!\left(f(u),f(v),f(w)\right)$ . ∎

Median graphs are a special subfamily of median algebras, [44, 45, 23, 5]. Some modern generalizations and applications may be found in [7].

A central result in Sageev-Roller duality, specialized here to the finite case, and reformulated for non-degenerate PCRs is:

Theorem 2.33.

The dual $\mathtt{Dual}\!\left(G\right)$ of a finite non-degenerate PCR $G$ is a finite median graph, with the median calculated according to the formula:

[TABLE]

and with intervals in $\mathtt{Dual}\!\left(G\right)$ calculated according to the formula:

[TABLE]

Conversely, if $\Gamma$ is a finite median graph then $\Gamma$ is naturally isomorphic to $\mathtt{Dual}\!\left(\mathcal{H}(\Gamma)\right)$ by sending every vertex $v$ to the $\ast$ -selection of all half-spaces of $\Gamma$ which contain $v$ .∎

This result is the consequence of a very strong convexity theory:

Theorem 2.34 (Properties of median graphs, [43], section 2).

Let $\Gamma=(V,E)$ be a finite median graph. Then:

(1)

Any family of pairwise intersecting convex sets has a common vertex; 2. (2)

Every convex set is an intersection of halfspaces; 3. (3)

For any convex subset $K\subset V$ , the subgraph of $\Gamma$ induced by $K$ is a median graph; 4. (4)

For any convex $K\subset V$ and any $v\in V\smallsetminus K$ there is a unique vertex $\mathtt{proj}_{K}(v)\in K$ at minimum hop distance from $v$ ; 5. (5)

For any convex $K\subset V$ , the nearest point projection $\mathtt{proj}_{K}({\scriptscriptstyle\bullet})$ is a median preserving, distance non-increasing retraction of $\Gamma$ onto its subgraph induced by $K$ .

Property (1) is often referred to as the Helly property.∎

The Helly property is, perhaps, the most notable of the results stated above. In our setting of PCR duals, it may be interpreted as guaranteeing the satisfiability of any family of conjunctive monomials over $\mathbb{A}$ in which every pair is separately satisfiable.

Convex hulls.

Given the central role of half-spaces in the convexity theory of median graphs, a notion of the set of half-spaces dual to a given set of vertices is useful:

Definition 2.35.

For $K\subset\mathbb{H}$ , its dual set of halfspaces, $K^{\scriptscriptstyle{\sharp}}$ , is defined to be the set of all $a\in\mathbf{\Sigma}$ with $K\subseteq\mathfrak{h}(a)$ .∎

An immediate corollary of Theorem 2.34(2) is:

Corollary 2.36.

Suppose $G$ is a non-degenerate PCR, and $K\subseteq G^{\circ}$ . Then $K^{\scriptscriptstyle{\sharp}}$ is $G$ -coherent and forward-closed, and the convex hull $\mathtt{hull}\!\left(K;G\right)$ of $K$ in $\mathtt{Dual}\!\left(G\right)$ coincides with $\mathfrak{h}(K^{\scriptscriptstyle{\sharp}};G)$ .∎

Thus, every convex subset of $G^{\circ}$ may be written as $\mathfrak{h}(S;G)$ for some $S\in\mathbf{C}(G)$ . This representation is unique, by last assertion of the following lemma:

Lemma 2.37.

Let $G$ be a non-degenerate PCR over $\mathbf{\Sigma}$ . Then, for all $S,S_{1},S_{2}\subset\mathbf{\Sigma}$ :

(1)

$\mathfrak{h}(S;G)\neq\varnothing$ * if and only if $S$ is coherent;* 2. (2)

For all $S_{1},S_{2}\subseteq\mathbf{\Sigma}$ one has $\mathfrak{h}(S_{1}\cup S_{2};G)=\mathfrak{h}(S_{1};G)\cap\mathfrak{h}(S_{2};G)$ ; 3. (3)

If $S_{2}\subseteq S_{1}$ then $\mathfrak{h}(S_{1};G)\subseteq\mathfrak{h}(S_{2};G)$ ; 4. (4)

If $S$ is coherent then $\mathfrak{h}(S;G)=\mathfrak{h}(S\!\uparrow;G)$ ; 5. (5)

If $S\in\mathbf{C}(G)$ , then $\mathfrak{h}(S;G)=\mathfrak{h}(\min(S);G)$ 6. (6)

For all $S_{1},S_{2}\in\mathbf{C}(G)$ one has $\mathfrak{h}(S_{1};G)\subseteq\mathfrak{h}(S_{2};G)\Leftrightarrow S_{2}\subseteq S_{1}$ .

Proof.

See Section C.1.∎∎

Another important result helps bound the distance from the points of one convex set to another:

Lemma 2.38.

Let $S,T\in\mathbf{C}(G)$ for a poc set $G$ over $\mathbf{\Sigma}$ . Then $\mathbf{\Delta}\!\left(u,\mathfrak{h}(T)\right)\leq\left|T\smallsetminus S\right|$ for all $u\in\mathfrak{h}(S)$ .

Proof.

See Section C.4.2.∎∎

This motivates the following definition for the general case:

Definition 2.39.

Let $G$ be a non-degenerate PCR over $\mathbf{\Sigma}$ and let $S,T\in\mathbf{C}(G)$ . The divergence of $S$ from $T$ is defined to be $\mathtt{Div}(S;T):=\left|T\smallsetminus S\right|$ .∎

Note how $\mathtt{Div}(S;T)$ seems independent of $G$ ; it is not, however, since it is only applied to upwards-closed coherent sets $S,T$ . We will use this notion of divergence in Section 5.3, to drive the decision-making mechanism of the binary UMA agents briefly introduced there.

More details about the convexity theory of a median graph will be discussed in the appendices, as we go about proving our algorithmic results.

2.5. Propagation: A Computational Workhorse.

We are now ready to present another central result of this paper: a low-complexity method for computing nearest point projections in $\mathtt{Dual}\!\left(G\right)$ , which we call propagation. This method obviates the need for maintaining an explicit representation of each vertex of $\mathtt{Dual}\!\left(G\right)$ in memory, reducing space requirements for this architecture from $O(2^{\left|\mathbf{\Sigma}\right|})$ in the worst case to $O(\left|\mathbf{\Sigma}\right|^{2})$ . The time complexity is, at worst, $O(\left|\mathbf{\Sigma}\right|^{2})$ , coming down to sub-linear on a fully parallel architecture, as will become evident below.

Definition 2.40 (coherent projection).

Let $G$ be a PCR over a finite PCS $\mathbf{\Sigma}$ . For any $T\subseteq\mathbf{\Sigma}$ , the set $\mathtt{coh}_{G}(T):=T\!\uparrow\smallsetminus T{{}^{\scriptscriptstyle\ast}}\!\downarrow=T\!\uparrow\smallsetminus(T\!\uparrow){{}^{\scriptscriptstyle\ast}}$ is said to be the $G$ -coherent projection of $T$ .∎

Coherent projection itself plays an important role in obtaining an observer’s belief state from its epistemic state (the learned PCR structure) and the latest observation (see Section 3.3).

The promised formula for computing projections works as follows.

Proposition 2.41.

Let $G$ be a PCR over a finite PCS $\mathbf{\Sigma}$ . Let $S,T\subset\mathbf{\Sigma}$ and suppose $S$ is $G$ -coherent. Let $L=\mathfrak{h}(S;G)$ and $K=\mathfrak{h}(\mathtt{coh}_{G}(T);G)$ . Then:

[TABLE]

where $\mathtt{proj}_{K}({\scriptscriptstyle\bullet})$ is the nearest-point projection to $K$ in $\mathtt{Dual}\!\left(G\right)$ defined in Theorem 2.34.

Proof.

See Section C.4.∎∎

This description of nearest point projection is easy to visualize as being computed by an algorithm propagating excitation among nodes of a directed graph:

Definition 2.42.

Let $G$ be a PCR over a finite PCS $\mathbf{\Sigma}$ . Let $S\subset\mathbf{\Sigma}$ . Denote by $[G,S]$ the graph with vertex set $\mathbf{\Sigma}$ , edge set $G$ and with Boolean weights $\lambda(a)=\mathds{1}_{S}(a)$ , $a\in\mathbf{\Sigma}$ attached to its vertices. We refer to it as $G$ * being loaded with $S$ *.∎

Definition 2.43.

A propagation algorithm over $G$ is any algorithm which, for any $G$ -coherent load $S\subset\mathbf{\Sigma}$ and any $T\subseteq\mathbf{\Sigma}$ accepts $[G,S]$ and $T$ as input and produces as its output the loaded graph $[G,\mathtt{PROP}(T,S;G)]$ , where

[TABLE]

Note that coherent closure is obtainable via $\mathtt{coh}_{G}(T)=\mathtt{PROP}(T,\varnothing;G)$ .∎

Envisioning $G$ as describing a graph of ‘cells’ labeled by $\mathbf{\Sigma}$ and ‘synapses’ labeled by pairs $ab\in G$ , the loaded graph $[G,S]$ represents a state of the network indicating that the cells of $S$ are in an excited state. A propagation algorithm should be seen as exciting, additionally, the cells of $T$ and spreading this excitation along the directed connections while inhibiting $a{{}^{\scriptscriptstyle\ast}}$ for each cell $a\in\mathbf{\Sigma}$ encountered along the way. Realized on a modern day computer, this may be achieved in quadratic time in $\left|\mathbf{\Sigma}\right|$ . For example, propagation could be implemented using a variant of depth-first search (DFS) on $\mathbf{\Gamma}\big{|}_{\scriptscriptstyle{t}}$ , while maintaining an expanding record of vertices visited [9]—see Algorithm 1. On a fully parallel machine allowing the ‘cells’ to compute their own excitation, the time complexity is clearly of the order of the longest directed vertex path in the network, which is sub-linear in $\left|\mathbf{\Sigma}\right|$ .

We now turn to a high-level description of the UMA architecture and its use of the results of this section.

3. Universal Memory Architecture (UMA): a High-Level View.

In this section we provide a high-level description of the basic UMA functionalities: PCR update/revision and maintaining a belief state.

3.1. Observation Model.

Recall from Section 2.1.1 that an observer is given a set $\mathbb{A}$ of initial Boolean queries over the space of histories $\mathbf{X}$ of the observed system. The system of queries and their complements is modeled as a PCS morphism $\rho:\mathbf{\Sigma}(\mathbb{A})\to\mathbf{2}^{\mathbf{X}}$ , which is unknown to the observer. The observer is presented with a sequence of observations $\mathtt{Obs}\big{|}_{\scriptscriptstyle{t}}\in\mathbb{H}(\mathbf{\Sigma}(\mathbb{A}))$ , and values $\varphi\big{|}_{\scriptscriptstyle{t}}\in\mathds{R}_{{}_{\geq 0}}$ , $t\geq 0$ , one per update cycle. One must distinguish between two settings:

**Static signal.: **

The value signal $\varphi\big{|}_{\scriptscriptstyle{t}}$ only depends on the raw observation $\mathtt{Obs}\big{|}_{\scriptscriptstyle{t}}$ ;

**Dynamic signal.: **

The value signal may produce $\varphi\big{|}_{\scriptscriptstyle{t}}\neq\varphi\big{|}_{\scriptscriptstyle{s}}$ while $\mathtt{Obs}\big{|}_{\scriptscriptstyle{t}}=\mathtt{Obs}\big{|}_{\scriptscriptstyle{s}}$ .

While ultimately interested in covering the dynamic setting, we will only deal with the static setting in this paper. However, the setting being static by no means implies it is unchanging. We will see in Section 5 that instances of the static setting may, nevertheless, have rich and interesting dynamics. This will happen, in part, as a result of introducing delayed queries. By these we mean the following: if $\sharp:\mathbf{X}\to\mathbf{X}$ denotes the operation of truncating the last state from a given history, then, for any conjunction $a$ of already available queries it is possible to introduce a new query of the form101010Here and on we abuse notation, applying the symbol $\sharp$ to denote both a delayed query and the history truncation/shift operator. Which is which is clear from the context. $\sharp a$ , where $\sharp a$ reports its value according to the rule $x\in\rho(\sharp a)\Leftrightarrow\sharp x\in\rho(a)$ , $x\in\mathbf{X}$ . Of course, implementing this operation requires that the UMA architecture retain the latest raw observation, but this seems like a small price to pay for increasing the range of application of the static setting.

The basic task of an UMA is to evolve a sequence $G\big{|}_{\scriptscriptstyle{t}}$ , $t\geq 0$ of non-degenerate PCRs over $\mathbf{\Sigma}=\mathbf{\Sigma}(\mathbb{A})$ while aiming for the PCRs $G\big{|}_{\scriptscriptstyle{t}}$ to eventually satisfy the following:

•

**‘Completeness’: ** $\rho:G\big{|}_{\scriptscriptstyle{t}}\to\mathbf{2}^{\mathbf{X}}$ is a PCR morphism, ensuring that every perceptual class is represented;

•

**‘Precision’: ** $\mathbf{M}\big{|}_{\scriptscriptstyle{t}}:=\mathtt{Dual}\!\left(G\big{|}_{\scriptscriptstyle{t}}\right)$ is as close as possible to the true model space $\mathbf{M}:=\rho{{}^{\scriptscriptstyle\ast}}(\mathbf{X})$ .

These requirements should not be taken literally, however. For example, it stands to reason that in some contexts the observer could afford to misclassify a few perceptual classes of low import. We will see how—at least under some of the learning schemes we propose—these vague requirements become possible to state precisely in terms of PAC learning.

3.2. Maintaining a PCR presentation: Snapshot Structures.

A rather restrictive notion of a snapshot structure—a method for learning a poc set structure from positive observations—was introduced by the authors in [19]. Here we merely review the main ideas to provide intuition, while deferring the formal constructions to Section 4.

Snapshot weights.

Motivated loosely by Hebbian ideas about learning [21], we consider maintaining an evolving symmetric system of weights $\mathtt{w}_{{\scriptscriptstyle\bullet}}\big{|}_{\scriptscriptstyle{t}}=(\mathtt{w}_{ab}\big{|}_{\scriptscriptstyle{t}})_{a,b\in\mathbf{\Sigma}\big{|}_{\scriptscriptstyle{t}}}$ , with $\mathtt{w}_{ab}\big{|}_{\scriptscriptstyle{t}}$ quantifying in some prescribed way a notion of cumulative degree of relevance of the event $\mathfrak{h}(ab)$ to the observer, at time $t$ .

In addition, rules to maintain $\mathtt{w}_{{\scriptscriptstyle\bullet}}\big{|}_{\scriptscriptstyle{t}}$ as time progresses, must be provided. First, a completion rule, to insert missing values into $\mathtt{w}_{{\scriptscriptstyle\bullet}}$ when it undergoes an extension. Second, an update rule, computing $\mathtt{w}_{{\scriptscriptstyle\bullet}}\big{|}_{\scriptscriptstyle{t+1}}$ from $\mathtt{w}_{{\scriptscriptstyle\bullet}}\big{|}_{\scriptscriptstyle{t}}$ and the incoming observation.

It is important for both rules to be as simple—and as local—as possible, so as not to sacrifice tractability. In our constructions, we constrain the update laws to ones where $\mathtt{w}_{ab}\big{|}_{\scriptscriptstyle{t+1}}$ depends only on $\mathtt{w}_{ab}\big{|}_{\scriptscriptstyle{t}}$ , the value signal $\varphi\big{|}_{\scriptscriptstyle{t+1}}$ , the truth value of the bit $\mathtt{Obs}\big{|}_{\scriptscriptstyle{t+1}}\in\mathfrak{h}(ab)$ and possible global parameters (e.g. the system clock $t$ ).

PCRs from snapshot weights.

Inspired by the rough mechanism proposed in [19], we seek weight systems $\mathtt{w}_{{\scriptscriptstyle\bullet}}$ for which the loosely specified rule—

[TABLE]

ranging over all $a,b\in\mathbf{\Sigma}$ with $\{a,a{{}^{\scriptscriptstyle\ast}}\}\neq\{b,b{{}^{\scriptscriptstyle\ast}}\}$ is guaranteed to define a non-degenerate PCR over $\mathbf{\Sigma}$ . The motivation for the rule is, of course, the fact that $\rho(a)\cap\rho(b{{}^{\scriptscriptstyle\ast}})=\varnothing$ is equivalent to $\rho(a)\subseteq\rho(b)$ , where $\rho$ is the PCS morphism defining the semantics of the queries in $\mathbf{\Sigma}$ .

Finally, note how the properties of a PCR are guaranteed (to the extent that the rule is well-defined, of course), and non-degeneracy is the only remaining question. Of course, the precise notion of ‘negligible’ defined for the purpose of comparing weights is crucial, and is expected to greatly affect the quality and limitations of the emerging representations.

3.3. Maintaining a Belief State.

Since, for each time $t$ , we only get to observe states from $\mathbf{M}(\rho)$ , we are facing the problem of having to learn negative statements—that is, the list of $G\big{|}_{\scriptscriptstyle{t}}$ -incoherent pairs—from the stream of positive examples $(\mathtt{Obs}\big{|}_{\scriptscriptstyle{t}},\varphi\big{|}_{\scriptscriptstyle{t}})$ . From what we have observed so far we must reason about what it is we might never encounter. Seeing that the implication record $G\big{|}_{\scriptscriptstyle{t}}$ is inherently uncertain, providing no guarantee at any time that the completeness requirement from Section 3.1 will be met, it is quite possible for the observation $\mathtt{Obs}\big{|}_{\scriptscriptstyle{t+1}}$ to land outside the model space $\mathbf{M}{}\big{|}_{\scriptscriptstyle{t+1}}=\mathtt{Dual}\!\left(G\big{|}_{\scriptscriptstyle{t+1}}\right)$ despite its prior role in forming this model space, during the snapshot update. In fact, its value may be too low to trigger a revision of $G\big{|}_{\scriptscriptstyle{t}}$ into a $G\big{|}_{\scriptscriptstyle{t+1}}$ for which $\mathtt{Obs}\big{|}_{\scriptscriptstyle{t+1}}$ becomes coherent.

Contrary to the approach adopted by modern iterated revision schemes based on Darwiche and Pearl’s [11], we do not insist on a revision forcing $\mathtt{Obs}\big{|}_{\scriptscriptstyle{t+1}}$ into $\mathbf{M}\big{|}_{\scriptscriptstyle{t+1}}$ . Instead, we apply $G=G\big{|}_{\scriptscriptstyle{t+1}}$ to the raw observation with aim to relax it, replacing it with a $G$ -coherent and forward-closed set:

[TABLE]

in the role of the current state of record, or the belief state. This way, UMA naturally resolves possible contradictions at the price of introducing ambiguity into its record of the current state: instead of marking a single vertex of $\mathbf{M}\big{|}_{\scriptscriptstyle{t+1}}$ as the current state, any vertex of the convex set $\mathfrak{h}(\mathtt{Curr}^{t+1})$ may turn out to be the correct current state from the observer’s point of view.

The choice of the coherent projection for the purpose of forming the belief state is motivated by its geometric and categorical properties. In our class of model spaces it is a canonical method of producing coherent sets, as witnessed by the following two results:

Proposition 3.1 (Coherent Approximation).

Let $G$ be a PCR over $\mathbf{\Sigma}$ . Then, for any $A\in\mathbb{H}(\mathbf{\Sigma})$ , if $B\in G^{\circ}$ realizes the Hamming distance

[TABLE]

—that is, if $\mathbf{\Delta}\!\left(A,B\right)=\mathbf{\Delta}\!\left(A,G^{\circ}\right)$ —then we must have $B\in\mathfrak{h}(\mathtt{coh}_{G}(A);G)$ .

Proof.

See Section C.2.∎∎

Thus, the operation $\mathtt{coh}_{{\scriptscriptstyle\bullet}}($ ) yields the “best approximation” of $\mathtt{Obs}\big{|}_{\scriptscriptstyle{t+1}}$ by a convex subset of $\mathbf{M}\big{|}_{\scriptscriptstyle{t+1}}$ , echoing the principle of minimal change as seen through Dalal’s way [10] of quantifying the distance between theories. Moreover:

Proposition 3.2 (Coherent Projection).

Let $G$ be a PCR over $\mathbf{\Sigma}$ . Then the following hold for all $A\subseteq\mathbf{\Sigma}$ :

•

(a) * $\mathtt{coh}_{G}(A)$ is coherent and $\mathtt{coh}_{G}(A)\!\uparrow=\mathtt{coh}_{G}(A)$ ;*

•

(b) * $\mathtt{coh}_{G}(\mathtt{coh}_{G}(A))=\mathtt{coh}_{G}(A)$ ;*

•

(c) * $A\subseteq\mathtt{coh}_{G}(A)$ whenever $A$ is $G$ -coherent;*

•

(d) * $\mathtt{coh}_{G}(A)=A$ if and only if $A$ is $G$ -coherent and $A\!\uparrow=A$ .*

In other words, as a self-map of $\mathbf{2}^{\mathbf{\Sigma}}$ , the operator $A\mapsto\mathtt{coh}_{G}(A)$ is an idempotent whose image coincides with $\mathbf{C}(G)$ .

Proof.

See Section C.3.∎∎

Note how properties (a) and (c) turn $\mathtt{coh}_{G}({\scriptscriptstyle\bullet})$ into a closure operator on the subspace of $G$ -coherent sets with respect to inference (implication). At the same time, (b) and (d) characterize the set $\mathbf{C}(G)$ of all terms that are closed under inference.

Overall, Equation 11 provides an intriguingly natural way of maintaining an internal model and belief state with a built-in degree of resilience to observations that fail to make immediate sense to the agent given its epistemic state. Finally, the complexity of this computation is the complexity of propagation over $G\big{|}_{\scriptscriptstyle{t+1}}$ , by Proposition 2.41 and the discussion following Definition 2.43.

4. Learning Algorithms for UMAs: Snapshot Structures.

4.1. Qualitative Snapshot Structures.

The goal of this section is to construct a snapshot structure suitable for a scenario in which the learner’s value signal is a ranking function in the sense of Pearl [39, 18] (which is a special form of Spohn’s OCFs [46]), its values providing a qualitative notion of the degree of irrelevance of the current experience. Thus, an observation with $\varphi\big{|}_{\scriptscriptstyle{t}}=0$ is considered desirable, while $\varphi\big{|}_{\scriptscriptstyle{t}}=1,2,\ldots$ renders an observation increasingly more irrelevant.

4.1.1. Rankings and 2-rankings

Throughout this section we let $\mathbf{\Sigma}$ be a PCS and let $\mathbb{H}$ denote the Hamming cube $O(\mathbf{\Sigma})^{\circ}$ . Also, let $\widehat{\mathds{N}}:=\{0,1,2,\ldots,\infty\}$ . We use the slight variation of the notion of a ranking from [39], which was introduced in [11]:

Definition 4.1.

A ranking on $\mathbb{H}$ is a function $\kappa:\mathbf{2}^{\mathbb{H}}\to\widehat{\mathds{N}}$ , satisfying:

•

$\kappa(F)=\min_{\sigma\in F}\kappa(\sigma)$ for all $F\subset\mathbb{H}$ ;

•

$\kappa(\sigma)<\infty$ for some $\sigma\in\mathbb{H}$ ;

•

$\kappa(\varnothing)=\infty$ .

Hereafter, we shall abuse notation, writing $\kappa(\sigma)$ to mean $\kappa(\{\sigma\})$ whenever $\sigma\in\mathbb{H}$ . Note that the minimum value of a ranking $\kappa$ is $\kappa(\mathbb{H})$ . ∎

Remark 4.2.

Note that, since $\mathbf{\Sigma}$ is assumed to be finite, the first requirement may be replaced with the requirement that $\kappa(F_{1}\cup F_{2})=\min\{\kappa(F_{1}),\kappa(F_{2})\}$ for all $F_{1},F_{2}\subset\mathbb{H}$ .

The simplest examples of rankings seem to be:

Example 4.3 (point-mass ranking).

Let $u\in\mathbb{H}$ and $r\in\widehat{\mathds{N}}$ , $r\neq\infty$ . Then the following function $\delta_{u,r}$ is a ranking:

[TABLE]

Example 4.4 (pointwise minimum).

If $\kappa_{1},\kappa_{2}$ are rankings on $\widehat{\mathds{N}}$ , then the function $\kappa(F):=\min(\kappa_{1}(F),\kappa_{2}(F))$ is also a ranking.∎

Recall now the sets $\mathfrak{h}(S)$ from Lemma 2.30. They will help us study the interaction between rankings and concepts:

Definition 4.5.

The concept representation of a ranking $\kappa$ , is the function $\mathtt{w}^{\kappa}_{S}:=\kappa\left(\mathfrak{h}(S)\right)$ , where $S$ ranges over subsets of $\mathbf{\Sigma}$ . To simplify notation, we will often write $\mathtt{w}^{\kappa}_{a_{1}a_{2}\cdots a_{k}}:=\mathtt{w}^{\kappa}_{S}$ whenever $S=\{a_{1},\ldots,a_{k}\}$ is explicitly provided. ∎

Remark 4.6.

Note that $\mathtt{w}^{\kappa}_{S}=\infty$ if $S$ is not a $\ast$ -selection. Also, $\mathtt{w}^{\kappa}_{\varnothing}=\kappa(\mathbb{H})$ , the minimum value of $\kappa$ .

Lemma 4.7 (triangle inequality).

For any ranking $\kappa$ on $\mathbb{H}$ , the following holds $\mathtt{w}^{\kappa}_{ac{{}^{\scriptscriptstyle\ast}}}\geq\min\left\{\mathtt{w}^{\kappa}_{ab{{}^{\scriptscriptstyle\ast}}},\mathtt{w}^{\kappa}_{bc{{}^{\scriptscriptstyle\ast}}}\right\}$ for all $a,b,c\in\mathbf{\Sigma}$ .

Proof.

See Section D.1.∎∎

We are interested in studying the interactions between rankings on $\mathbb{H}$ and non-degenerate poc-graph structures on $\mathbf{\Sigma}$ . A weakened notion of ranking is required for this purpose.

Definition 4.8.

A 2-ranking on $\mathbf{\Sigma}$ is a symmetric matrix $\mathtt{w}_{{\scriptscriptstyle\bullet}}=(\mathtt{w}_{ab})_{a,b\in\mathbf{\Sigma}}$ with entries in $\widehat{\mathds{N}}$ , satisfying the following for all $a,b,c\in\mathbf{\Sigma}$ :

(1)

$\mathtt{w}_{\mathbf{0}a}=\mathtt{w}_{aa{{}^{\scriptscriptstyle\ast}}}=\infty$ ; 2. (2)

$\min(\mathtt{w}_{aa},\mathtt{w}_{a{{}^{\scriptscriptstyle\ast}}a{{}^{\scriptscriptstyle\ast}}})=\min(\mathtt{w}_{bb},\mathtt{w}_{b{{}^{\scriptscriptstyle\ast}}b{{}^{\scriptscriptstyle\ast}}})<\infty$ ; 3. (3)

$\mathtt{w}_{aa}=\min\left(\mathtt{w}_{ab},\mathtt{w}_{ab{{}^{\scriptscriptstyle\ast}}}\right)$ ; 4. (4)

$\mathtt{w}_{ac{{}^{\scriptscriptstyle\ast}}}\geq\min\left(\mathtt{w}_{ab{{}^{\scriptscriptstyle\ast}}},\mathtt{w}_{bc{{}^{\scriptscriptstyle\ast}}}\right)$ .

We will say that a ranking $\kappa$ agrees with $\mathtt{w}_{{\scriptscriptstyle\bullet}}$ , if $\mathtt{w}^{\kappa}_{ab}=\mathtt{w}_{ab}$ for all $a,b\in\mathbf{\Sigma}$ . Also, we will abbreviate as follows: $\mathtt{w}_{a}:=\mathtt{w}_{aa}$ and $\mathtt{w}_{\varnothing}:=\min(\mathtt{w}_{a},\mathtt{w}_{a{{}^{\scriptscriptstyle\ast}}})$ , for all $a\in\mathbf{\Sigma}$ . Finally, note how $\mathtt{w}_{a}=\min\{\mathtt{w}_{a0{{}^{\scriptscriptstyle\ast}}},\mathtt{w}_{a0}\}$ must hold, too, for all $a\in\mathbf{\Sigma}$ , by virtue of requirements 1. and 3.∎

Of course, the idea is to have a 2-ranking play the role of a snapshot weight, from which one needs to derive a non-degenerate PCR. In our learning setting, the best one could do is to derive from the samples of the value signal $\varphi$ the 2-ranking $(\mathtt{w}^{\varphi}_{ab})_{a,b\in\mathbf{\Sigma}}$ . The main question is, then, how much of the original $\varphi$ could be recovered from this information. The following family of PCRs helps answer this question:

Proposition 4.9.

Suppose $\mathtt{w}_{{\scriptscriptstyle\bullet}}$ is a 2-ranking, and let $\mathtt{w}_{\varnothing}\leq\delta\in\widehat{\mathds{N}}$ . Consider the PCRs on $\mathbf{\Sigma}$ defined by:

[TABLE]

for $\delta<\infty$ , and by:

[TABLE]

Then $\mathbf{Res}\!\left(\mathtt{w}_{{\scriptscriptstyle\bullet}};\delta\right)$ is a non-degenerate PCR for all $\delta\in\widehat{\mathds{N}}$ .

Proof.

See Section D.2.∎∎

A surprising consequence of the non-degeneracy of these PCRs is the following corollary, leading to the conclusion that every 2-ranking has a ranking that agrees with it:

Corollary 4.10.

Let $\mathtt{w}_{{\scriptscriptstyle\bullet}}$ be a 2-ranking on $\mathbf{\Sigma}$ , and let $a,b\in\mathbf{\Sigma}$ . Set $r:=\mathtt{w}_{ab}$ . Then there exists a vertex $u\in\mathbb{H}$ such that the point mass ranking $\nu=\delta_{u,r}$ satisfies $\mathtt{w}^{\nu}_{pq}\geq\mathtt{w}_{pq}$ for all $p,q\in\mathbf{\Sigma}$ .

Proof.

See Section D.3.∎∎

Proposition 4.11.

Let $\mathtt{w}_{{\scriptscriptstyle\bullet}}$ be a symmetric $\widehat{\mathds{N}}$ -valued matrix. Then $\mathtt{w}_{{\scriptscriptstyle\bullet}}$ is a 2-ranking if and only if there exists a ranking with which it agrees. Moreover, if $\mathtt{w}_{{\scriptscriptstyle\bullet}}$ is a 2-ranking, then there exists one and only one ranking,

[TABLE]

that agrees with $\mathtt{w}_{{\scriptscriptstyle\bullet}}$ and satisfies $\widehat{\mathtt{w}}\leq\kappa$ for every ranking $\kappa$ that agrees with $\mathtt{w}_{{\scriptscriptstyle\bullet}}$ .

Proof.

See Section D.4.∎∎

The upshot of the last proposition is that, henceforth, any 2-ranking may be treated as encoding a ranking. Formally:

Definition 4.12.

Suppose $\mathtt{w}_{{\scriptscriptstyle\bullet}}$ is a 2-ranking and $\kappa$ is a ranking. The completion of $\mathtt{w}_{{\scriptscriptstyle\bullet}}$ is the ranking $\widehat{\mathtt{w}}$ from the preceding proposition. The 2-restriction of $\kappa$ is the 2-ranking, denoted $\kappa^{\scriptscriptstyle{(2)}}$ , obtained from $\kappa$ via the concept representation, that is: $\kappa^{\scriptscriptstyle{(2)}}_{ab}:=\mathtt{w}^{\kappa}_{ab}$ for all $a,b\in\mathbf{\Sigma}$ . The 2-closure of $\kappa$ is the ranking, denoted $\widehat{\kappa}$ , obtained from $\kappa$ as the completion of its 2-restriction. In particular one has $\kappa\geq\widehat{\kappa}$ .∎

4.1.2. Derived PCRs and their duals.

We now introduce the PCR used in the qualitative snapshot structure. As systems of defaults, these PCRs are strengthened (more restrictive) versions of the (ranked) default systems constructed by Goldzmidt and Pearl in [18], and they satisfy an analogous characterization.

Proposition 4.13.

Suppose $\mathtt{w}_{{\scriptscriptstyle\bullet}}$ is a 2-ranking. For $0\leq\delta<\infty$ , let its derived PCR be defined by:

[TABLE]

and for $\delta=\infty$ let it be defined by:

[TABLE]

Then, $\mathbf{Der}\!\left(\mathtt{w}_{{\scriptscriptstyle\bullet}};\delta\right)$ is a non-degenerate PCR for all $\delta\in\widehat{\mathds{N}}$ .

Proof.

Let $G=\mathbf{Der}\!\left(\mathtt{w}_{{\scriptscriptstyle\bullet}};\delta\right)$ and $R=\mathbf{Res}\!\left(\mathtt{w}_{{\scriptscriptstyle\bullet}};\delta\right)$ . Once again, the basic properties of a PCR are baked into the definition of $G$ . Furthermore, observe that $ab\in G$ implies $ab\in R$ (though not the other way around). In particular, we have $N(G)\subseteq N(R)$ and it follows that $N(G)\cap N(G){{}^{\scriptscriptstyle\ast}}\subseteq N(R)\cap N(R){{}^{\scriptscriptstyle\ast}}=\varnothing$ , as required. ∎∎

Definition 4.14.

Proposition 4.11 and Definition 4.12 make it possible for us to abuse notation and talk about the residual and derived PCRs of a ranking by setting $\mathbf{Res}\!\left(\kappa;\delta\right):=\mathbf{Res}\!\left(\kappa^{\scriptscriptstyle{(2)}};\delta\right)$ , and $\mathbf{Der}\!\left(\kappa;\delta\right):=\mathbf{Der}\!\left(\kappa^{\scriptscriptstyle{(2)}};\delta\right)$ , dropping all mention of $\delta$ when $\delta=0$ , as before. Of course, $\kappa$ may be replaced with its 2-closure $\widehat{\kappa}$ throughout .∎

We proceed to study properties of derived PCRs and their duals, to verify their utility to our representation problem. Specifically, we are interested in the geometry of level sets, as we try to answer the question: how well does the 2-restriction of a ranking $\kappa$ capture the set of global minimum points of $\kappa$ (the most meaningful states according to $\kappa$ )?

Definition 4.15.

Given an integer $\epsilon\geq 0$ and a 2-ranking $\mathtt{w}_{{\scriptscriptstyle\bullet}}$ , denote:

[TABLE]

The set $\mathtt{M}(\mathtt{w}_{{\scriptscriptstyle\bullet}}):=\mathtt{M}(\mathtt{w}_{{\scriptscriptstyle\bullet}};0)$ will be referred to as the minset of $\mathtt{w}_{{\scriptscriptstyle\bullet}}$ . By virtue of Proposition 4.11, this notion extends to rankings as follows:

[TABLE]

with $\mathtt{M}(\kappa):=\mathtt{M}(\kappa;0)$ being the minset of $\kappa$ .∎

It is clear that a global minimum point of a ranking $\kappa$ must contain $\mathtt{M}(\kappa)$ . Hence, $\mathfrak{h}(\mathtt{M}(\kappa))$ contains all global minima of $\kappa$ , but what does this have to do with the derived PCR and its dual? The main result is as follows:

Proposition 4.16.

Let $\kappa$ be a ranking on $\mathbb{H}$ and set $G=\mathbf{Der}\!\left(\kappa\right)$ and $M=\mathtt{M}(\kappa)$ . Let $F$ and $\widehat{F}$ be the sets of global minima of $\kappa$ and $\widehat{\kappa}$ , respectively. Then $F\subseteq\widehat{F}\subseteq G^{\circ}$ and $\widehat{F}=\mathfrak{h}(M;G)$ . Moreover, $\widehat{F}$ is the convex hull of $F$ in $\mathtt{Dual}\!\left(G\right)$

Proof.

See Section D.5.∎∎

Upon inspection, the details of the proof generate the impression that $\widehat{\kappa}$ is, for lack of a better word, a form of convex smoothing of $\kappa$ , the last proposition showing how the collection of possibly disparate minimum points of $\kappa$ coalesces into a convex plateau of minimum points of $\widehat{\kappa}$ in the dual space of the derived PCR.

4.1.3. A Snapshot Structure to Learn a Ranking.

We return to our learning problem. Suppose $\varphi$ is a fixed ranking on $\mathbb{H}$ , and we are given a sequence of samples $\varphi\big{|}_{\scriptscriptstyle{t}}=\varphi(\mathtt{Obs}\big{|}_{\scriptscriptstyle{t}})$ , where $\mathtt{Obs}\big{|}_{\scriptscriptstyle{t}}\in\mathbb{H}$ are the observations made by our agent. We will assume $\varphi\big{|}_{\scriptscriptstyle{t}}<\infty$ for all $t$ , reserving $\varphi(u)=\infty$ for the impossible observations.

We must define the weight update taking place in response to an incoming observation; and the weight extension in response to a query being added to the sensorium.

Weight update (static case).

For our snapshot structure, we propose the following update rule for the snapshot weights:

[TABLE]

By Example 4.4, $\mathtt{w}_{{\scriptscriptstyle\bullet}}\big{|}_{\scriptscriptstyle{t}}$ is a 2-weight for every $t\geq 0$ , giving rise to a non-degenerate PCR in the form of

[TABLE]

Since the sequence of weights is pointwise non-increasing, its convergence is guaranteed. Moreover, exposure to (at most) $N=\binom{\left|\mathbf{\Sigma}\right|}{2}-\tfrac{\left|\mathbf{\Sigma}\right|}{2}$ observations covering all pairs $\{a,b\}$ with $\{a,a{{}^{\scriptscriptstyle\ast}}\}\cap\{b,b{{}^{\scriptscriptstyle\ast}}\}=\varnothing$ , sampling a minimum rank world in $\mathfrak{h}(ab)$ for each pair $\{a,b\}$ at least once, will result in $\mathtt{w}_{{\scriptscriptstyle\bullet}}$ coinciding with $\varphi^{\scriptscriptstyle{(2)}}$ . This motivates the question “How much less exposure is required for delivering the same result on average, in, say, an appropriately formulated PAC setting?”, and emphasizes the good fit of ranking-based snapshot structures to settings featuring a teacher.

4.2. Statistical Integrators of a Real-Valued Signal.

The original suggestion of [19] for maintaining a system of weights in the role of a snapshot structure was based on the idea that $\mathtt{w}_{ab}\big{|}_{\scriptscriptstyle{t}}$ should be the empirical estimate at time $t$ of the probability of the event $a\wedge b$ , so that $ab\in G\big{|}_{\scriptscriptstyle{t}}$ could be put on record if and only if $\mathtt{w}_{ab{{}^{\scriptscriptstyle\ast}}}\big{|}_{\scriptscriptstyle{t}}<\min(\mathtt{w}_{ab}\big{|}_{\scriptscriptstyle{t}},\mathtt{w}_{a{{}^{\scriptscriptstyle\ast}}b{{}^{\scriptscriptstyle\ast}}}\big{|}_{\scriptscriptstyle{t}},\mathtt{w}_{a{{}^{\scriptscriptstyle\ast}}b}\big{|}_{\scriptscriptstyle{t}},\tau_{ab}\big{|}_{\scriptscriptstyle{t}})$ , where $\tau_{ab}\big{|}_{\scriptscriptstyle{t}}$ is a fixed threshold. That is, the implication $a\rightarrow b$ is put on record whenever the event $a\wedge\neg b$ has sufficiently low empirical probability. We have since found out that the improved formalization provided by Propositions 2.17 and 2.24 enables the use of a far more general weight update scheme that is capable of incorporating a value signal into the learner’s reasoning while also taking into account the observed frequency of events.

4.2.1. Real-valued 2-weights.

Once again, the learner is presented with a sequence of observations $u_{t}\in\mathbb{H}$ , accompanied by the signal $\varphi\big{|}_{\scriptscriptstyle{t}}=\varphi(u_{t})$ . This time we require that the value signal $\varphi\big{|}_{\scriptscriptstyle{t}}$ presented to the agent at time $t$ is a real number greater than or equal to $1$ , where a higher value of $\varphi$ indicates a more meaningful state of the observed system.

Definition 4.17.

A real-valued 2-weight on a PCS $\mathbf{\Sigma}$ is a symmetric, real-valued function $\mathtt{w}_{{\scriptscriptstyle\bullet}}=(\mathtt{w}_{ab})_{a,b\in\mathbf{\Sigma}}$ on $\mathbf{\Sigma}\times\mathbf{\Sigma}$ , satisfying the following requirements for all $a,b,c\in\mathbf{\Sigma}$ :

(1)

$\mathtt{w}_{ab}\geq 0$ , $\mathtt{w}_{0a}=0$ and $\mathtt{w}_{aa{{}^{\scriptscriptstyle\ast}}}=0$ ; 2. (2)

$\mathtt{w}_{\varnothing}:=\mathtt{w}_{a}+\mathtt{w}_{a{{}^{\scriptscriptstyle\ast}}}=\mathtt{w}_{b}+\mathtt{w}_{b{{}^{\scriptscriptstyle\ast}}}$ ; 3. (3)

$\mathtt{w}_{a}:=\mathtt{w}_{aa}=\mathtt{w}_{ab}+\mathtt{w}_{ab{{}^{\scriptscriptstyle\ast}}}$ ; 4. (4)

$\mathtt{w}_{ab{{}^{\scriptscriptstyle\ast}}}+\mathtt{w}_{bc{{}^{\scriptscriptstyle\ast}}}+\mathtt{w}_{ca{{}^{\scriptscriptstyle\ast}}}=\mathtt{w}_{a{{}^{\scriptscriptstyle\ast}}b}+\mathtt{w}_{b{{}^{\scriptscriptstyle\ast}}c}+\mathtt{w}_{c{{}^{\scriptscriptstyle\ast}}a}$ ; 5. (5)

$\mathtt{w}_{ac{{}^{\scriptscriptstyle\ast}}}+\mathtt{w}_{a{{}^{\scriptscriptstyle\ast}}c}\leq\mathtt{w}_{ab{{}^{\scriptscriptstyle\ast}}}+\mathtt{w}_{a{{}^{\scriptscriptstyle\ast}}b}+\mathtt{w}_{bc{{}^{\scriptscriptstyle\ast}}}+\mathtt{w}_{b{{}^{\scriptscriptstyle\ast}}c}$ .

When $\mathtt{w}_{ab}=0$ for all $a,b\in\mathbf{\Sigma}$ , we say $\mathtt{w}_{{\scriptscriptstyle\bullet}}$ is trivial.∎

The following example provides motivation for the definition:

Example 4.18.

Suppose $(\mathbf{X},\mathscr{B},\mu)$ is a measure space and $\varphi:\mathbf{X}\to\mathds{R}$ is a non-negative function in $L_{1}(\mu)$ . Suppose $\rho:\mathbf{\Sigma}\to\mathscr{B}$ is a PCS morphism, when $\mathscr{B}$ is viewed as a sub-PCS of $\mathbf{2}^{\mathbf{X}}$ (recall Example 2.3). Then $\mathtt{w}_{ab}:=\int_{\rho(a)\cap\rho(b)}\varphi\mathrm{d}\mu$ is a real-valued 2-weight. Indeed, since the integral of a non-negative function is non-negative, the requirements 1.-5. become corollaries of various set-theoretic identities applied to $A=\rho(a)$ , $B=\rho(b)$ and $C=\rho(c)$ , respectively:

(1)

$\varnothing\cap A=\varnothing$ , $A\cap(\mathbf{X}\smallsetminus A)=\varnothing$ . 2. (2)

$\mathbf{X}=A\cup(\mathbf{X}\smallsetminus A)=B\cup(\mathbf{X}\smallsetminus B)$ , 3. (3)

$A=(A\cap B)\cup(A\smallsetminus B)$ , 4. (4)

$(A\smallsetminus B)\cup(B\smallsetminus C)\cup(C\smallsetminus A)=(B\smallsetminus A)\cup(C\smallsetminus B)\cup(A\smallsetminus C)$ (see Figure 2), 5. (5)

$A\vartriangle C\subseteq(A\vartriangle B)\cup(B\vartriangle C)$ ,

where $A\vartriangle B:=(A\smallsetminus B)\cup(B\smallsetminus A)$ , for short.

Example 4.19 (point mass weight).

Similarly to the qualitative setting, the simplest example of a weight of this form is given by a point-mass measure on $\mathbb{H}$ :

[TABLE]

where $F\subset\mathbb{H}$ (Compare with Example 4.3).∎

4.2.2. Derived PCRs and their duals.

The resulting notion of a derived PCR requires a system $\tau_{\scriptscriptstyle\bullet}$ of threshold values, denoted $\tau_{ab}\big{|}_{\scriptscriptstyle{t}}\in(0,1)$ , $a,b\in\mathbf{\Sigma}$ , satisfying the identities

[TABLE]

for all $a,b\in\mathbf{\Sigma}$ and $t\geq 0$ . This makes it possible to construct a non-degenerate PCR as follows:

Proposition 4.20.

For any choice of threshold values $\tau_{\scriptscriptstyle\bullet}$ satisfying Equation 24, if $\mathtt{w}_{{\scriptscriptstyle\bullet}}$ is non-trivial, then

[TABLE]

defines a non-degenerate PCR.

Proof.

See Section E.1.∎∎

Let $G=\mathbf{Der}\!\left(\mathtt{w}_{{\scriptscriptstyle\bullet}}\right)$ for a real-valued 2-weight $\mathtt{w}_{{\scriptscriptstyle\bullet}}$ . A notion analogous to that of a minset may be considered in the real-valued setting, taking into account the reversal of the value hierarchy (now, bigger values of $\varphi$ are considered the most significant):

[TABLE]

The argument that $M=\mathtt{M}(\mathtt{w})$ is $G$ -coherent and forward-closed, for any choice of the thresholds $\tau_{\scriptscriptstyle\bullet}$ , is the same as the one given for minsets in the qualitative setting (Lemma D.1 in Section D.5), upon reversing the relevant inequalities. This time around, however, the non-empty convex subset $F=\mathfrak{h}(M;G)$ of $\mathtt{Dual}\!\left(G\right)$ does not directly relate to extreme points of the value signal $\varphi$ in $\mathbb{H}$ , but, rather, to a notion of center of mass of $G^{\circ}$ with respect to $\mathtt{w}_{{\scriptscriptstyle\bullet}}$ , seen as a representation of the distribution of the value signal over $\mathbb{H}$ .

4.2.3. Snapshot update.

Similarly to the qualitative setting, in the real-valued setting we will also be assembling our estimate of the [integrals of the] observed value signal from point-masses, this time replacing minimization with linear combinations. The update rule for a discounted integrator snapshot takes the form:

[TABLE]

where the $q\big{|}_{\scriptscriptstyle{t}}\in(0,1]$ are the discount coefficients, $t\geq 1$ . The fact that $\mathtt{w}_{{\scriptscriptstyle\bullet}}\big{|}_{\scriptscriptstyle{t+1}}$ is a convex combination of real-valued 2-weights ensures that $\mathtt{w}_{{\scriptscriptstyle\bullet}}\big{|}_{\scriptscriptstyle{t+1}}$ is a real-valued 2-weight as well.

Types of update.

We studied two variants of the discounted integrator snapshot:

(1)

**Empirical Snapshot. ** In this case, one sets $q\big{|}_{\scriptscriptstyle{t}}:=\tfrac{t+1}{t+2}$ , resulting in

[TABLE]

which is the empirical estimate for the integral of $\varphi$ over $\rho(a)\cap\rho(b)$ . For this snapshot type, we used fixed thresholds $\tau_{ab}=\tau$ . 2. (2)

Fixed Discount Snapshot. Here one sets $q\big{|}_{\scriptscriptstyle{t}}:=q$ , a constant, playing the role of a rate at which information acquired about the signal ‘fades’ unless continually reinforced by incoming observations:

[TABLE]

The eventual purpose of using an update of this form is to accommodate settings where $\varphi$ has multiple peaks, as well as, possibly, the dynamic setting, provided the value signal changes sufficiently slowly.

PAC learning guarantees.

The notion of probably approximately correct (PAC) learning introduced by Valiant [48] is one framework within which the quality of UMAs based on real-valued snapshots could be discussed. The assumptions of this setting are that the observations $(u_{t})_{t\geq 0}$ are i.i.d. samples of a fixed distribution on $\mathbb{H}$ , in which case, for any fixed pair $a,b\in\mathbf{\Sigma}$ with $\{a,a{{}^{\scriptscriptstyle\ast}}\}\cap\{b,b{{}^{\scriptscriptstyle\ast}}\}=\varnothing$ , one could think of the sequence of input values $\mathrm{X}_{ab}\big{|}_{\scriptscriptstyle{t}}:=\varphi\big{|}_{\scriptscriptstyle{t}}\cdot\delta_{u_{t}}(\mathfrak{h}(ab))$ as a sequence of i.i.d. samples of a random variable $\mathrm{X}_{ab}\in[0,A]$ , where $A$ is an upper bound on the value signal $\varphi$ . Equation Equation 27 then lets us think of $\mathtt{w}_{ab}\big{|}_{\scriptscriptstyle{t}}$ as random variables $\mathrm{Y}_{ab}\big{|}_{\scriptscriptstyle{t}}$ constructed according to $\mathrm{Y}_{ab}\big{|}_{\scriptscriptstyle{t+1}}=q\big{|}_{\scriptscriptstyle{t}}\mathrm{Y}_{ab}\big{|}_{\scriptscriptstyle{t}}+(1-q\big{|}_{\scriptscriptstyle{t}})\mathrm{X}_{ab}\big{|}_{\scriptscriptstyle{t+1}}$ . Applying induction one immediately verifies that $\mathbb{E}\left[\mathrm{Y}_{ab}\big{|}_{\scriptscriptstyle{t}}\right]=\mathbb{E}\left[\mathrm{X}_{ab}\right]$ for all $t\geq 0$ . It thus becomes reasonable to ask how many samples are required in order to bring the probability that $\left|\mathrm{Y}_{ab}\big{|}_{\scriptscriptstyle{t}}-\mathbb{E}\left[\mathrm{X}_{ab}\right]\right|>\varepsilon$ below a specified threshold. Valiant [48] had long ago observed that Chernoff bounds are a powerful tool for answering such questions. Computing Chernoff bounds for our setting yields:

Proposition 4.21 (PAC learning in empirical snapshots).

Given $\delta>0$ , the empirical snapshot learning mechanism attains a precision of $\delta$ on all weights, with probability $1-\delta$ from a number of i.i.d randomized samples that is at most linear in $\tfrac{1}{\delta}$ , at a rate depending only on the value signal.

Proof.

See Section E.2.∎∎

Our simulation results indicate that similar guarantees could be expected for the discounted setting, but the standard Chernoff-inspired approaches for leveraging the independence of the observations do not seem to work. Since discounted snapshot learning makes it easier for the representation to recover from false implications, it is important to ascertain whether or not a result of the form Proposition 4.21 could be proved, and if not—in what circumstances it might fail.

Other learning scenarios.

The PAC learning guarantees of the preceding paragraph are predicated on the assumption that the sequence of observations is statistically independent. This assumption becomes unreasonable for an observer of a system whose state evolves continuously over time, subject to some internal dynamics, in which case it is often unlikely that contiguous observations will be uncorrelated.

A fairly general model of such settings is provided by Markov chains [42], where the underlying Markov process models the (uncertain) dynamics of the observed system. In our setting, one regards $\rho{{}^{\scriptscriptstyle\ast}}(\mathbf{X})$ —the set of observable possible worlds in $\mathbb{H}(\mathbf{\Sigma})$ (Section 2.1.1)—as the set of states of a fixed (albeit unknown) Markov process. Then, by the ergodic theorem for Markov chains [15], one has:

Proposition 4.22.

Suppose the sequence of observations $u_{t}\in\mathbb{H}$ is sampled from an a-periodic, irreducible, positive-recurrent Markov chain with limiting distribution $\pi$ . Then the empirical snapshot weights $\mathtt{w}_{ab}\big{|}_{\scriptscriptstyle{t}}$ learned from the constant value signal $\varphi\big{|}_{\scriptscriptstyle{t}}=1$ converge to the marginals $\int_{\rho(a)\cap\rho(b)}\mathrm{d}\pi$ , for all $a,b\in\mathbf{\Sigma}$ .∎

In particular, any thresholded implications derived from the real-valued 2-weight $\mathtt{w}_{ab}:=\int_{\rho(a)\cap\rho(b)}\mathrm{d}\pi$ will be recovered in this process.

Finally, it follows from the decomposition theorem for Markov chains [15] that the ergodicity assumption in the above proposition does not impose undue restrictions on our model, as we only expect an agent to learn implications from recurring observations anyway. We also note that the special case of lazy random walks guarantees an exponential rate of convergence to the limiting distribution in many interesting cases (see Theorem 5.1 of [32] and Theorem 9 of [42]).

5. Simulations.

We present two kinds of simulation studies. Section 5.2 illustrates the preceding results about learning with different snapshot types in a sample of ‘toy’ settings. Section 5.3 explains how to construct simple UMA-based binary agents, whose performance is considered in Section 5.4.

5.1. Simulation settings.

Each setting considered in Section 5.2 consists of an observer/agent $\mathscr{A}$ situated in a discrete environment, $\mathbf{E}$ . For simplicity, the queries assigned to $\mathscr{A}$ are functions of the agent’s current position in the environment, which we denote by $\mathtt{pos}(t)\in\mathbf{E}$ . Let $[N]:=\{0,\ldots,N\}$ . The environments and sensory endowments we consider are:

•

**Discretized interval with GPS. ** Here $\mathbf{E}=[N]$ , and $\mathscr{A}$ has queries $\mathbb{A}=\{a_{1},\ldots,a_{N}\}$ , with $a_{i}$ holding true at time $t$ iff $\mathtt{pos}(t)<i$ ;

•

**Discretized circle with beacons. ** Now set $\mathbf{E}=[N-1]$ with $a_{i}$ ( $i=0,\ldots,N-1$ ) holding true iff $\mathtt{pos}(t)$ is close enough to $i$ , modulo $N$ ;

•

**Discretized interval with random position sensors. ** $\mathbf{E}=[N]$ again, and $\mathbb{A}=\{a_{1},\ldots,a_{N}\}$ , with $a_{i}$ true at time $t$ iff $\mathtt{pos}(t)\in A_{i}$ , where $A_{i}\subsetneq\mathbf{E}$ are chosen uniformly at random ahead of each simulation run.

We consider different value signals, all set to be functions of the position, depending on snapshot type:

•

**Qualitative Snapshots. ** Two natural choices of the signal are considered,

[TABLE]

where $T$ should be regarded as a “target” position of high significance.

•

**Real-valued Snapshots. ** To parallel the “sharp peak”/“dull peak” signal variants from the qualitative setting, we pick:

[TABLE]

respectively. For discounted snapshots, the discount coefficients were picked to be $q=0.999$ . Learning thresholds are constant, where relevant, and are chosen to equal $\tfrac{1}{2N}$ to ensure correct learning of implications among the initial sensors by the real-valued snapshots.

5.2. Simulation results for observers.

To assess the speed and quality of PCR learning, we track the error-rate of the learned PCR representation—the fraction of correctly learned PCR implications—over time.

5.2.1. Repeated i.i.d. sampling (PAC-style setting).

Figure 3 compares logarithmic plots of two mean error rates over $100$ observation sequences generated by repeated i.i.d. uniform sampling of positions from the environment, for the settings described in Section 5.1 for $N=20$ :

(1)

**Solid lines. ** The mean fraction of incorrect implications in the learned PCR relative to the expected PCR for the given learner in each setting, as a function of time; 2. (2)

**Dashed lines. ** The mean fraction of incorrect implications in the transitive closure of the learned PCR relative to the poc set of actual implications among the provided sensors, as a function of time; 3. (3)

**Shaded regions ** depict the mean $\pm$ standard deviations for the corresponding quantities.

The first most notable feature of the figures—beyond confirming (and, in fact, exceeding) the theoretical results—is the complete agreement of the curves for all six learners on the interval (left column). Since the sensors in this case are nested, the poc set of true implications coincides with the derived PCR induced by the expected weights and is recovered quickly and completely.

Next, on the circle we begin to see the difference between the quality of the learned PCR and the quality of the inferred system of implications as compared to the real ones. This deterioration in quality was to be expected, as transitive closure enables the deduction of implications from chains of approximate implications recorded in the PCR. Observe that the discrepancy is bigger for the sharp peak settings, in which a very small degree of significance is assigned to positions farther away from the target. This difference is most notable in the qualitative learners: while completely absent in the dull peak setting, it is very visible in the sharp peak setting. We account for these differences, among other things, in the detailed analysis of the true PCR provided in Appendix F.

A similar discrepancy is visible, but less pronounced in the third column, though we must keep in mind that, in this column, each run was executed with a different random collection of sensors. The differences are less pronounced than on the circle because in the sensorium we have chosen for the circle there is very little nesting, while in a random sensorium, the probability of nesting is non-negligible. Nesting relations $\rho(a)\subset\rho(b)$ in the sensorium forces $\mathtt{w}_{ab{{}^{\scriptscriptstyle\ast}}}\big{|}_{\scriptscriptstyle{t}}=0$ in real-valued snapshots, and $\mathtt{w}_{ab{{}^{\scriptscriptstyle\ast}}}\big{|}_{\scriptscriptstyle{t}}=\infty$ in qualitative snapshots at all times $t$ , guaranteeing that $a<b$ will be learned with sufficient exposure. The rotation-invariant sensorium we chose for the circle has very little nesting, and hence much more room for error if the provided value signal happens to discount too many positions as being insignificant. Deeper differences arise as a result of the circle’s non-trivial homotopy type, which we discuss in Section 5.4.2 and further in Section 6.

Finally, let us remark that we do not yet have a good explanation for the good behavior of the discounted learners. We were unable to prove any concentration inequalities for the discounted weight update to parallel the ones obtained for the empirical one. Moreover, the quality of learning appears to be very sensitive to the choice of discount parameter. In fact, it was this difficulty with appropriately selecting and controlling the discount parameter that motivated the construction of qualitative learners in the first place.

5.2.2. Lazy random walk (learning from “motor babble”).

For a robotic system, a more realistic mode of sampling from the environment is “motor babble”: a random walk on $\mathbf{E}$ generated by repeated i.i.d. sampling from the space of available actions/decisions. In this mode, each instance of the agent $\mathscr{A}$ is constrained to a small set of available actions, depending on $\mathbf{E}$ :

•

**Discretized interval. ** The allowed actions are a single step to the right ( $\mathtt{rt}\colon p\mapsto\min\{N,p+1\}$ ), a step to the left ( $\mathtt{lt}\colon p\mapsto\max\{p-1,0\}$ ), or to remain in place;

•

**Discretized circle. ** Similarly, on the circle $\mathtt{rt}\colon p\mapsto p+1(\mathrm{mod}\;N)$ , or $\mathtt{lt}\colon p\mapsto p-1(\mathrm{mod}\;N)$ or to do nothing at all.

Figure 4 shows the evolution of the error rates we had considered earlier in Section 5.2.1, in the new sampling mode.

This set of plots provides a good illustration of the robustness of UMA learning—especially with qualitative snapshots—where the quality of learning improves over time (though now at a much slower pace, due to the change in the sampling process), as the observer gains more exposure to the observed system.

5.2.3. Learning the target set over time.

We compare how UMA learners of different snapshot types develop their notion of the target set, $\mathtt{M}(\mathtt{w}_{{\scriptscriptstyle\bullet}})$ , over time. For this purpose, Figure 5 shows this evolution for a single run from a separate batch of lazy random walk observations in a smaller environment ( $N=10$ ), over a shorter period of time ( $500$ cycles). The features observed in this plot are, however, typical of the runs we generated for Figure 4. “Downgrading” the experiment to a smaller environment enabled faster learning, and hence plotting the run at a lower resolution, without requiring the reader to magnify the plot attempting to discern its significant features.

Observe the eventual precision and efficiency of the qualitative reasoners, compared to the drift (away from the target) clearly noticeable for the real-valued learners. Also note some initial delay in learning the target (in comparison with other types) in the discounted learners: the value of $q$ places a bound on how quickly an implication may be learned.

Both these observations are typical of all the batches we have observed. This suggests the qualitative UMA learners as the best bet for upgrading UMAs to perform learning in the dynamic setting. This also suggests that the real-valued learners could benefit from more careful shaping of the value signal, with significantly sharper peaks, as well as from lower values of the discount parameter (for discounted learners), if learning on shorter time scales is important.

5.3. Binary UMA agents.

Postponing a more general formal definition of a binary UMA agent to another paper, let us describe just the simple sub-class of these agents considered here.

Actions as agents.

Given the environment $\mathbf{E}$ and the associated set of queries $\mathbb{A}$ as described above in Section 5.1, we regard each of the actions $\alpha$ available to $\mathscr{A}$ as an individual agent $\mathscr{A}_{\alpha}$ , in charge of making the decision whether to act ( $\alpha$ ) or not to act ( $\alpha{{}^{\scriptscriptstyle\ast}}$ ). Any conflicts between decisions made by different $\mathscr{A}_{\alpha}$ are, at this stage of development, arbitrated by hard-wiring (see example in Section 5.4.1 below).

Extended query set.

For $\mathscr{A}_{\alpha}$ to be capable of considering the consequences of its decisions, we have to extend $\mathbb{A}$ so as to enable reasoning about the past. Specifically, each $\mathscr{A}_{\alpha}$ is assigned a value signal $\varphi_{\alpha}$ , and an initial set of queries $\mathbf{\Sigma}_{\alpha}:=\mathbf{\Sigma}(\mathbb{A}\cup\sharp\mathbb{A})$ , where $(\sharp)$ is the delay operator: the query $\sharp q$ holds true at time $t+1$ if and only if $q$ held true at time $t$ .

UMA representation conditional on action.

The BUA $\mathscr{A}_{\alpha}$ maintains two snapshots, $\mathtt{w}^{\beta}_{{\scriptscriptstyle\bullet}}$ , $\beta\in\{\alpha,\alpha{{}^{\scriptscriptstyle\ast}}\}$ . The $2$ -weight $\mathtt{w}^{\beta}_{{\scriptscriptstyle\bullet}}$ is updated precisely in those transitions in which $\mathscr{A}_{\alpha}$ acted according to $\beta$ . Thus, at any time $t$ , $\mathtt{w}^{\beta}_{{\scriptscriptstyle\bullet}}\big{|}_{\scriptscriptstyle{t}}$ may be used to infer implications conditioned on $\beta$ taking place, by computing a derived graph, $G^{\beta}\big{|}_{\scriptscriptstyle{t}}$ .

Prediction.

Given the current state $\mathtt{Curr}^{\beta}\big{|}_{\scriptscriptstyle{t}}$ at time $t$ as represented by the $\beta$ snapshot, $\beta\in\{\alpha,\alpha{{}^{\scriptscriptstyle\ast}}\}$ , the prediction for time $(t+1)$ given $\beta$ , $\mathtt{Pred}^{\beta}\big{|}_{\scriptscriptstyle{t+1}}$ , is defined to be the coherent projection of $\sharp\mathtt{Curr}^{\beta}\big{|}_{\scriptscriptstyle{t}}$ with respect to $G^{\beta}\big{|}_{\scriptscriptstyle{t}}$ . This is the collection of sensations which $\mathscr{A}_{\alpha}$ can prove will occur if $\beta$ is chosen to take place, provided, of course, $G^{\beta}\big{|}_{\scriptscriptstyle{t}}$ persists into the $(t+1)$ -st cycle.

Decision.

At the same time, each of the agent’s two snapshots has a notion of where it is that the agent should be: the subset $\mathfrak{h}(\mathtt{M}(\mathtt{w}^{\beta}_{{\scriptscriptstyle\bullet}});G^{\beta}\big{|}_{\scriptscriptstyle{t}})$ . A simple way for $\mathscr{A}_{\alpha}$ to make a choice of $\beta\in\{\alpha,\alpha{{}^{\scriptscriptstyle\ast}}\}$ is to pick the value of $\beta$ for which $\mathtt{Div}(\mathtt{Pred}^{\beta}\big{|}_{\scriptscriptstyle{t+1}};\mathtt{M}(\mathtt{w}^{\beta}_{{\scriptscriptstyle\bullet}})\big{|}_{\scriptscriptstyle{t}})$ is smaller, and to flip an even coin in the case of a tie (recall Definition 2.39).

5.4. Simulation results for agents.

5.4.1. Sniffy: locating a stationary target using “place field” sensors.

Consider an agent $\mathscr{A}$ in one of the two fixed settings described above in Section 5.1, with two actions $\mathtt{rt}$ and $\mathtt{lt}$ , as defined in Section 5.2.2, implemented as BUAs according to Section 5.3 with the value signals given in Equations (30) and (31). To minimize interference between $\mathtt{rt}$ and $\mathtt{lt}$ , we impose a hard-wired arbitration mechanism: if $\mathtt{lt}$ and $\mathtt{rt}$ decide to act at the same time, a Bernoulli $-\frac{1}{2}$ random trial decides which one of them to suppress.

At the beginning of each simulation run, Sniffy (our pet agent $\mathscr{A}$ ) and its target are placed in random positions in $\mathbf{E}$ , denoted $\mathtt{pos}(0)$ and $T$ , respectively. The agent then experiences a training period during which every decision by every BUA is overridden by a random one, resulting in a lazy random walk. Once the training period is over, the BUAs are given control authority, with Sniffy acting according to their decisions.

Finally, following the indications of Section 5.2.3, we have chosen to replace the discount parameter of $q=0.999$ with $q=1-\tfrac{1}{N+1}$ , to enable a faster response by the discounted learners.

Figure 6 reports the results of our simulations. Each plot shows the mean, plus/minus standard deviation, over $100$ distinct runs, of the distance of the agent to its target as a function of time, in each setting.

Section A.2.4 discusses the representations expected to arise in the case of the interval in some detail, explaining Sniffy’s success in that environment, shown in the figure. However, we also notice a deterioration of the results as Sniffy is moved from the interval to the circle. This is due to subtle interactions between the propagation mechanism generating the BUAs’ predictions (which drives decision-making), and the non-trivial homotopy type of the circle, which forces inconsistent states into all the model spaces involved (the latter, we recall, are always contractible). This discrepancy between the topology of UMA model spaces and the spaces they come to model provides the main motivation for our future project of studying the control of situated agents by networks of BUAs, where the deliberation among agents is meant to generate an emergent joint representation of reactive behavior patterns with the competence to overcome topological constraints and obstacles (more in Section 6).

5.4.2. What did Sniffy learn on the circle?

All the graphs in Figure 6 indicate a significant change of behavior at the end of training. It therefore seems sensible to attempt splitting the set of runs in each setting into those finishing closer to the target than to its antipodal point on the circle, as shown in Figure 7.

What emerges is that all the learned representations experience difficulties dealing with the situation loosely characterized as “Sniffy approaches the point on the circle antipodal to the target”. Note that the “dull peak” qualitative learners emerge as the most apt, both in terms of efficiency and in terms of separation between the desirable and undesirable modes of behavior. In this setting, the target clearly emerges as an attracting point except for a small neighbourhood of its antipode, which seems to play the role of an unstable equilibrium. This is reminiscent of gradient descent over the function $f(x)=\mathtt{dist}\!\left(x,T\right)$ on the unit circle, viewed as a differentiable manifold: the target $T$ is a robust attractive equilibrium, complemented by an unstable equilibrium that is forced by the non-trivial homotopy type of the circle. Since qualitative snapshots enable direct computation of the eventual values of the snapshot weights, it becomes possible to obtain explicit insights into the behavior learned by Sniffy in this setting. We refer the reader to Appendix F for a detailed discussion proving the preceding claims.

6. Discussion.

Motivated by the goal of implementing well-reasoned general learning on mobile robots, this paper introduces algorithms implementing a simplified version of iterated belief revision and update that is consistent with budgetary constraints on storage space and computational complexity, collectively named “universal memory architectures” (UMAs). We establish and study the mathematical language necessary for the analysis of UMA instances, and show how the standard model-theoretic approach to belief revision gets naturally replaced by the study of the geometry of convex sets in the model spaces represented by UMAs.

By construction, UMA representations are systems of default rules that are closed under counter-positives. We show that such representations may be learned both by means of sampling and statistical integration of a real-valued signal (empirical and discounted snapshots, Section 4.2), as well as by means of aggregating samples of a ranking function on the space of possible worlds, in the sense of Spohn [46] and Pearl [39] (qualitative snapshots Section 4.1). In the latter case, we are able to guarantee the correct encoding of the convex hull, in the learned geometry, of the set of minimum rank worlds, provided sufficient exposure. Finally, we show the potential of UMA representations for the motivating application by considering its behavior in a pair of simple learning settings simulating a standard task formulation from Robotics: localize a target in the presence of (highly impoverished) sensing in a global frame (Section 5.4.1).

The need for expanding the set of queries (‘self-enrichment’).

It is important to state clearly the limitations of UMA learners in the form presented in this paper. From a practical perspective, attempting to learn a PCR structure for a fixed sensorium will yield no learning at all in the case of an arbitrary and/or ‘unstructured’ binary sensorium such as the pixel grid of a B/W video camera, where no two pixels are a-priori correlated. Consider an even simpler example: the situation of $a,b,c\in\mathbf{\Sigma}$ satisfying $\rho(a)\cap\rho(b)\subseteq\rho(c)$ , $\rho(a)\nsubseteq\rho(c)$ and $\rho(b)\nsubseteq\rho(c)$ cannot be encoded by a PCR unless the query set $\mathbf{\Sigma}$ explicitly contains an element whose realization is $\rho(a)\cap\rho(b)$ . Finally, it is clear that PCRs are not geared for studying temporal interactions unless explicitly outfitted with appropriate queries (as in the example of BUAs in Section 5.3).

Accepting the above as the price of the radical reduction in computational costs achieved by UMA-based learning (as compared to unrestricted iterated belief revision), a natural avenue for increasing the descriptive power of an UMA representation is to allow the set of queries $\mathbf{\Sigma}$ to expand (by adding ‘meaningful’ queries) and contract (by coalescing related queries, or deleting uninformative ones) over time, in a controlled fashion, at a known and minimal cost in computational resources. The fixed sensorium $\mathbf{\Sigma}$ should be replaced with a sequence $\mathbf{\Sigma}\big{|}_{\scriptscriptstyle{t}}$ , as the map $\rho:\mathbf{\Sigma}\to\mathbf{2}^{X}$ is replaced with a sequence $\rho\big{|}_{\scriptscriptstyle{t}}:\mathbf{\Sigma}\big{|}_{\scriptscriptstyle{t}}\to\mathbf{2}^{X}$ . Still, the advantage of UMA representations over others is in their efficiency at encoding a model space and reasoning about it in terms of its convex subspaces. This motivates the search for an enrichment method that meets the lower complexity bound for representing the observed system.

Looking for such a method, one must be mindful that the expansion steps cannot be arbitrary, as it is necessary for each map $\rho\big{|}_{\scriptscriptstyle{t+1}}:\mathbf{\Sigma}\big{|}_{\scriptscriptstyle{t+1}}\to\mathbf{2}^{\mathbf{X}}$ to be uniquely determined by its predecessor $\rho\big{|}_{\scriptscriptstyle{t}}$ and the limited information that was available to the UMA at time $t$ . This suggests two natural elementary expansion operations, which also happen to interact well with our detailed understanding of the geometry of duals:

**Append a conjunction.: **

Adding a query of the form $q=a_{1}\wedge\cdots\wedge a_{k}$ for some $a_{1},\ldots,a_{k}\in\mathbf{\Sigma}\big{|}_{\scriptscriptstyle{t}}$ , to form $\mathbf{\Sigma}\big{|}_{\scriptscriptstyle{t+1}}=\mathbf{\Sigma}\big{|}_{\scriptscriptstyle{t}}\cup\{q,q{{}^{\scriptscriptstyle\ast}}\}$ , forces the extension of $\rho$ via $\rho(q)=\rho(a_{1})\cap\ldots\cap\rho(a_{k})$ .

**Append a delayed sensor.: **

Let $\sharp:\mathbf{X}\to\mathbf{X}$ denote the operation of truncating the last state from a given history; Then it is possible to introduce a query of the form $q=\sharp a$ for $a\in\mathbf{\Sigma}$ , where $\sharp a$ reports the value of $a$ preceding the current one, or, in other words: $x\in\rho(\sharp a)\Leftrightarrow\sharp x\in\rho(a)$ .

Observing $\sharp(a\wedge b)=\sharp a\wedge\sharp b$ for all $a,b\in\mathbf{\Sigma}$ , we conclude that any composition of the above extension operations determines a unique extension of the original $\rho$ . Hence, an UMA endowed with these enrichment operations is capable, in principle, of eventually representing very rich theories of the observed system, both in terms of Boolean relations among the original sensors and in terms of temporal properties—provided we are willing to accept the cost in resources. Clearly, the burden is on us to decide when an extension is in order; for what purpose; and how to prevent the population of added sensors from exploding to a prohibitive size.

In the presence of delayed queries, the situation lends itself to the formation of a prediction operator, extending the simplistic one constructed in Section 5.3. This makes it possible to formulate learning objectives concerning the quality of prediction. Our ongoing work exploring analogies with perceptron learning [37] is directed towards studying the problem of optimizing prediction through gradual extension of the sensorium using the operations just formulated.

Agents.

The stated motivation for this project was that of producing computationally efficient agents whose reasoning is grounded in a suitably relaxed—though still formally reasoned—form of iterated BR. At the same time, the model spaces encoded by UMAs are uniquely suited for reactive control: the selection of a control instruction in direct response to a localized (in time, as well as in space) perception of the task. At all times that the goal set is represented by a coherent selection on $\mathbf{\Sigma}$ (that is, the goal set is non-empty and convex in the relevant model space), propagation may be used to produce the nearest point projection paths from the current state to the goal set, within the model space, helping determine the appropriate actions as those provably propelling the agent roughly along one of these paths, using the mechanism described in Section 5.3.

A pertinent question for our current research is whether or not it is possible to employ self-enrichment procedures (see preceding paragraph) to guarantee—at least for some classes of problems—the emergence of a representation with the property that an agent’s predictions from time $t$ never fall outside the perceived current state at time $(t+1)$ , for all $t$ large enough. If, and when, that becomes possible, one will have to conclude that any planning failure is due to an obstacle in the relevant UMA model space(s) originating from an attempt to navigate into an impossible perceptual class. This would open the door to methods for efficient representation of such classes, as well as the leveraging of such representations for correcting the simplistic control scheme of navigation along geodesics.

Improving representation using multiple agents.

The possible presence of obstacles focuses our attention on another important deficiency of UMA representations. While the concept representations they encode are always contractible when regarded as cubical complexes (see Section A.3 for more details), the concept representations corresponding to the ground truth will, more often than not, possess cavities/holes, serving the role of obstacles to navigation along geodesics in the UMA model space, and driving up the complexity of continuous planning [14]. An example of this phenomenon is already encountered in our simulations of target localization on the circle, in Section 5.4.2, and investigated in detail in Appendix F.

A possible solution to this problem might lie with the accumulation of a flexible collection of specialized agents, each with its own sensors and its own value signal; each correctly representing some aspects of the ‘physical’ agent’s tasks, while having to rely on others in regions of its model space where its predictions fails. Fairly detailed descriptions of communities of this form have been proposed as possible models of human cognition by Minsky [35, 36], and studying the dynamics of such communities, charged with governing a situated agent, poses many interesting challenges.

In this context it is important to note that very recent results [41], demonstrating smooth(!) reactive switching between different control alternatives (behaviors/actions) using value-based motivational dynamics, provide a basis for speculation that (1) such methods may be applicable to our setting, too; and (2) formal understanding of the dynamics of the putative Minskian “societies” of UMAs just mentioned may be well within our reach.

Developing this approach will require the study of multi-agent systems incorporating means for the formation of “BUA coalitions”, for lack of a better term, to be recruited for action under appropriate circumstances. This is also where we expect the mathematical theory behind UMAs to prove most useful. Its categorical underpinnings (the fact that model spaces arise as dual spaces; see Section 2.2) provide a rigorous framework for comparing different models of the same system, and for studying the interaction between different perceptual components of a single model (see Sections A.2.4 and F where we carry out such detailed analysis).

acknowledgements

This research was developed in part with funding from Air Force Research Lab (AFRL) grant FA865015D1845 (subcontract 669737-1), and in part with funding from the Defense Advanced Research Projects Agency (DARPA) and the Air Force Research Lab (AFRL) under agreement number FA8650-18-2-7840. The views, opinions and/or findings expressed are those of the authors and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government. The authors are grateful to Siqi Huang, a Penn CGGT Master’s graduate, for his relentless work developing a hardware-accelerated implementation of the UMA architecture, making the simulations in this study possible. We also thank Kostas Karydis for helping proof-read some of the initial material on ranking-based UMAs during his last months as a post-doctoral fellow at Penn’s GRASP lab.

Appendix A Appendix: The Duality Theory of Finite Poc Sets.

The purpose of this appendix is to review known results about the geometry of duals of finite poc sets, while illustrating them with simple examples which emphasize our application. An additional goal is to provide a sufficient technical background for proofs of new results in the appendices that follow.

The concept presentation of the dual of a poc set leads to more intuitive understanding of the geometry of poc set duals. Recall from Section 2.1.2 that the concept representation of a subset $V\subset\mathbb{H}$ of vertices of the Hamming cube over a PCS $\mathbf{\Sigma}$ encodes the set of (cubical) faces of the Hamming cube obtained by deleting all faces containing at least one vertex of $\mathbb{H}\smallsetminus V$ . The resulting structure is a (rather special) cubical complex111111See [29], Chapter 2, for a very brief introduction to polyhedral (in particular, cubical) complexes.. One way in which such cubical complexes are special is that they are completely determined by their 1-dimensional skeleton—their collections of vertices and edges.

The resulting freedom to consider a higher dimensional “enveloping structure” for $\mathtt{Dual}\!\left(\mathbf{P}\right)$ when $\mathbf{P}$ is a poc set over $\mathbf{\Sigma}$ turns out to be useful in many ways, some of which we intend to explore in this section.

Definition A.1 (Dual Cubing).

Let $\mathbf{P}$ be a poc set structure over a finite PCS $\mathbf{\Sigma}$ . The dual cubing $\mathtt{Cube}\!\,(\mathbf{P})$ is the cubical complex obtained as the concept representation of the subset $\mathbf{P}^{\circ}\subset\mathbb{H}(\mathbf{\Sigma})$ .∎

In the very least, the ability to refer to $\mathtt{Cube}\!\,(\mathbf{P})$ will make it easier to visualize the graph $\mathtt{Dual}\!\left(\mathbf{P}\right)$ , exposing its higher dimensional structure and bringing order to what otherwise would have been a chaos of edges (e.g. Figure 11). The notion of a dual cubing also makes it easier to understand cartesian products of dual graphs (Section A.2.2 below). Finally, we will use the dual cubing to explain some fundamental properties and limitations of PCR presentations (Section A.2.3 below) relating to their universality (Proposition 2.22).

A.1. Nesting, Transversality and Cubes.

Fix a poc set $\mathbf{P}$ over a finite PCS $\mathbf{\Sigma}$ . The purpose of this section is to present the known characterizations of the cubes arising in $\mathtt{Cube}\!\,(\mathbf{P})$ . Some additional standard terminology will be needed. The following are from [43], Section 1.4:

Definition A.2 (proper elements, proper pairs).

Let $\mathbf{\Sigma}$ be a PCS. A proper element of $\mathbf{\Sigma}$ is any element $a\in\mathbf{\Sigma}$ such that $a\notin\{\mathbf{0},\mathbf{0}{{}^{\scriptscriptstyle\ast}}\}$ . A pair $\{a,b\}$ of proper elements in $\mathbf{\Sigma}$ is said to be proper, if $b\notin\{a,a{{}^{\scriptscriptstyle\ast}}\}$ .∎

Definition A.3 (nesting, transversality).

Let $\mathbf{P}$ be a poc set. For any proper $a,b\in\mathbf{P}$ at most one of the following holds:

[TABLE]

If any one of the above relations holds, we will say that $a$ * and $b$ are nested*. Otherwise, we say that $a$ * and $b$ are transverse*. Furthermore, for any $A\subset\mathbf{P}$ , we say that $A$ * is nested (transverse)*, if every two elements of $A$ are nested (resp. transverse).∎

Recalling that a poc set is, first and foremost, a partially ordered set, for any subset $S\subset\mathbf{P}$ it makes sense to consider

[TABLE]

Since $\mathbf{P}$ is finite, $\min(S)$ is non-empty whenever $S$ is. The following is Proposition 10.1 of [43], restricted to the finite case and parsed into more elementary language:

Lemma A.4 (when a vertex meets a cube).

Let $\mathbf{P}$ be a finite poc set and let $v\in\mathbf{P}^{\circ}$ . Let $Q$ be a $d$ -dimensional cube of $\mathtt{Cube}\!\,(\mathbf{P})$ . Then $v\in Q$ if and only if $\min(v)$ contains a transverse subset $T$ of $\mathbf{P}$ with the property that every vertex $u\in Q$ is of the form $u=\left[v\right]_{{}_{S}}:=(v\smallsetminus S)\cup S{{}^{\scriptscriptstyle\ast}}$ for some $S\subseteq T$ .∎

In particular:

•

every edge ( $1$ -cube) containing $v$ is spanned by $v$ and a vertex of the form $\left[v\right]_{{}_{a}}:=(v\smallsetminus\{a\})\cup\{a{{}^{\scriptscriptstyle\ast}}\}$ for some $a\in\min(v)$ ;

•

every square ( $2$ -cube) containing $v$ is spanned by $v$ , $\left[v\right]_{{}_{a}}$ , $\left[v\right]_{{}_{b}}$ and $\left[v\right]_{{}_{ab}}$ for some transverse pair $\{a,b\}\subseteq\min(v)$ .

These properties give rise to a new understanding of how the half-spaces $\{\mathfrak{h}(a;\mathbf{P})\}_{a\in\mathbf{P}}$ in $\mathtt{Dual}\!\left(\mathbf{P}\right)$ interact with the geometry of $\mathtt{Cube}\!\,(\mathbf{P})$ :one could think of the splitting of $\mathbf{P}^{\circ}$ in the form $\mathfrak{h}(a;\mathbf{P})\cup\mathfrak{h}(a{{}^{\scriptscriptstyle\ast}};\mathbf{P})$ as the result of cutting the cubing $\mathtt{Cube}\!\,(\mathbf{P})$ along the hyperplane arising as the union of perpendicular bisectors of edges of the form $\{v,\left[v\right]_{{}_{a}}\}$ —see Figure 8.

In addition, the last lemma plays a crucial role in deducing some fundamental properties of $\mathtt{Cube}\!\,(\mathbf{P})$ (Proposition 10.2 of [43]):

Theorem A.5.

Let $\mathbf{P}$ be a finite poc set. Then $\mathtt{Cube}\!\,(\mathbf{P})$ is contractible.121212Contractibility of a topological space is a fundamental notion in Topology, formalizing the idea of a “space with no holes”. See [20], Chapter 0 for a quick and very intuitive introduction.∎

Moreover, the lemma implies that $\mathtt{Cube}\!\,(\mathbf{P})$ is non-positively curved (see [50], Section 2.1). This produces a characterization of complexes of the form $\mathtt{Cube}\!\,(\mathbf{P})$ (Theorem 10.3 of [43]):

Theorem A.6 (characterization of cubings).

A cubical complex arises as the dual of a finite poc set if and only if it is contractible and non-positively curved.∎

All the above apply in far more general settings than the finite one: the interested reader should consult [43].

A.2. Examples of Duals.

To improve the reader’s intuition regarding dual graphs of poc sets, as well as to illustrate one of the example simulations (Section 5.4.1), we consider a sequence of examples in light of the results of Section 2.4.

A.2.1. Example: a bead on a string.

Suppose the system being observed consists of a bead strung on a tight piece of string. The observed state of the system is modeled by the interval $[0,1]$ in the obvious way, so the space of histories $\mathbf{X}$ is the set of sequences $x=(x_{n})_{n=-\infty}^{0}$ , where $\mathtt{pos}(x):=x_{0}$ corresponds to the current position of the bead given $x$ , $x_{-1}$ is the previous position of the bead, and so on. Let us set $\mathbf{\Sigma}=\{\mathbf{0},\mathbf{0}^{\ast},a_{1},a_{1}^{\ast},\ldots,a_{L},a_{L}^{\ast}\}$ with two different poc set structures, $\mathbf{P}$ and $\mathbf{Q}$ , defined by the relations $a_{k}<a_{k+1}$ , $1\leq k<L$ in $\mathbf{P}$ and $a_{i}<a_{j}^{\ast}$ , $1\leq i<j\leq L$ in $\mathbf{Q}$ . These may be regarded as PCR representations of two different sensoria constructed as follows. Let $p_{1}<\ldots<p_{L}$ in $(0,1)$ be points that are pairwise at least $\epsilon$ apart, $0<\epsilon<\tfrac{1}{2(L+1)}$ . Then $\mathbf{P}$ may be realized by setting $x\in\rho(a_{k})\Leftrightarrow\mathtt{pos}(x)<p_{k}$ (“threshold sensors”), while $\mathbf{Q}$ may be realized, for example, by $x\in\rho(a_{k})\Leftrightarrow\mathtt{dist}\!\left(\mathtt{pos}(x),p_{k}\right)<\epsilon$ (“beacon sensors”).

The vertices of $\mathtt{Dual}\!\left(\mathbf{P}\right)$ have the form $V_{k}=\{\mathbf{0^{\ast}}\}\cup\{a_{j}^{\ast}\}_{j>k}\cup\{a_{i}\}_{i\geq k}$ , $0\leq k\leq L$ , with an edge joining $V_{k}$ to $V_{k+1}$ for all $k<L$ (recall that edges in $\mathtt{Dual}\!\left(\mathbf{P}\right)$ are edges of the Hamming cube $\mathbb{H}=\mathbb{H}(\mathbf{\Sigma})$ ). The graph $\mathtt{Dual}\!\left(\mathbf{Q}\right)$ has a different collection of vertices, dictated by the fact that all pairs $\{a_{i},a_{j}\}$ with $i\neq j$ are incoherent: there is a ‘special’ vertex $V^{\prime}_{0}=\{\mathbf{0^{\ast}},a_{1}^{\ast},\ldots,a_{L}^{\ast}\}$ and a collection of ‘generic’ ones, $V^{\prime}_{k}=\{\mathbf{0^{\ast}},a_{k}\}\cup\{a_{j}^{\ast}\}_{j\neq k}$ ; all the $V^{\prime}_{k}$ , $k>0$ , are adjacent to $V^{\prime}_{0}$ , and no other pair of vertices are adjacent. Figure 9 shows $\mathtt{Dual}\!\left(\mathbf{P}\right)$ (left), which is an $L$ -path, and $\mathtt{Dual}\!\left(\mathbf{Q}\right)$ (right), which we will refer to in the future as a starfish. Note how, of the two model spaces, $\mathtt{Dual}\!\left(\mathbf{P}\right)$ seems to provide the better discretization of $[0,1]$ . Note that both duals are trees. This is a manifestation of the well-known fact that $\mathtt{Dual}\!\left(\mathbf{P}\right)$ is a tree if and only if $\mathbf{P}$ is nested (that is, any two elements of $\mathbf{P}$ are nested).

A.2.2. Example: Cartesian products of duals.

The easiest way to join two poc sets together is to form their direct sum:

Definition A.7.

Let $\mathbf{P}$ and $\mathbf{Q}$ be discrete poc sets. Their direct sum $\mathbf{P}\vee\mathbf{Q}$ is defined to be the quotient of their external disjoint union $P\sqcup Q$ by the identification $\mathbf{0}_{\mathbf{P}}=\mathbf{0}_{\mathbf{Q}}$ and $\mathbf{0^{\ast}}_{\mathbf{P}}=\mathbf{0^{\ast}}_{\mathbf{Q}}$ , endowed with the following:

•

$a\leq b\text{ in }\mathbf{P}\vee\mathbf{Q}\Leftrightarrow(\{a,b\}\subseteq\mathbf{P}\text{ and }a\leq b\text{ in }\mathbf{P})\text{ or }(\{a,b\}\subseteq\mathbf{Q}\text{ and }a\leq b\text{ in }\mathbf{Q})$ ;

•

$a=b{{}^{\scriptscriptstyle\ast}}\text{ in }\mathbf{P}\vee\mathbf{Q}\Leftrightarrow(\{a,b\}\subseteq\mathbf{P}\text{ and }a=b{{}^{\scriptscriptstyle\ast}}\text{ in }\mathbf{P})\text{ or }(\{a,b\}\subseteq\mathbf{Q}\text{ and }a=b{{}^{\scriptscriptstyle\ast}}\text{ in }\mathbf{Q})$ .

We abuse notation by identifying each element of $\mathbf{P}\cup\mathbf{Q}$ with the equivalence class in $\mathbf{P}\vee\mathbf{Q}$ of its natural representative in $\mathbf{P}\sqcup\mathbf{Q}$ .∎

Consider the two inclusion maps, $p\colon\mathbf{P}\hookrightarrow\mathbf{P}\vee\mathbf{Q}$ and $q\colon\mathbf{Q}\hookrightarrow\mathbf{P}\vee\mathbf{Q}$ , each of which is an injective poc morphism. The dual maps $p^{\circ}$ and $q^{\circ}$ give rise to the median morphism $\mu:(\mathbf{P}\vee\mathbf{Q})^{\circ}\to\mathbf{P}^{\circ}\times\mathbf{Q}^{\circ}$ defined by $\mu(w)=(p^{\circ}(w),q^{\circ}(w))$ , where $p^{\circ}(w)=w\cap\mathbf{P}$ and $q^{\circ}(w)=w\cap\mathbf{Q}$ , by definition. Since every proper pair $a,b\in\mathbf{P}\vee\mathbf{Q}$ with $a\in\mathbf{P}$ and $b\in\mathbf{Q}$ satisfies $a\pitchfork b$ , it follows that $u\cup v$ is coherent for any $u\in\mathbf{P}^{\circ}$ and $v\in\mathbf{Q}^{\circ}$ , and we conclude that $\mu$ is bijective.

Finally, recall that an edge in $\mathtt{Dual}\!\left(\mathbf{P}\vee\mathbf{Q}\right)$ joining $w=\mu(u,v)$ with $w^{\prime}=\mu(u^{\prime},v^{\prime})$ occurs iff $\left|w\smallsetminus w^{\prime}\right|=1$ . Since the intersection of $\mathbf{P}$ with $\mathbf{Q}$ in $\mathbf{P}\vee\mathbf{Q}$ is trivial, in terms of $u,u^{\prime},w,w^{\prime}$ we obtain:

[TABLE]

so that $w,w^{\prime}$ span an edge if and only if exactly one of the pairs $\{u,u^{\prime}\}$ or $\{v,v^{\prime}\}$ spans an edge. Thus, $\mu$ is a median isomorphism of the dual graphs and we have:

Corollary A.8.

Let $\mathbf{P,Q}$ be discrete poc sets. Then the mapping

[TABLE]

is a median-preserving graph isomorphism.∎

For an alternative argument, note that for any $u\in\mathbf{P}^{\circ}$ and $v\in\mathbf{Q}^{\circ}$ , if $S\subset\min(u)$ and $T\subset\min(v)$ are transverse sets, then $S\cup T\subset\min(u\cup v)$ and is a transverse set in $\mathbf{P}\vee\mathbf{Q}$ . Therefore, by Lemma A.4, every cube in $\mathtt{Cube}\!\,(\mathbf{P})\times\mathtt{Cube}\!\,(\mathbf{Q})$ corresponds to a unique cube in $\mathtt{Cube}\!\,(\mathbf{\mathbf{P}\vee\mathbf{Q}})$ . Thus $\mu$ from the corollary is much more than an isomorphism of graphs: it extends to an isomorphism of cubical complexes from $\mathtt{Cube}\!\,(\mathbf{P}\vee\mathbf{Q})$ onto $\mathtt{Cube}\!\,(\mathbf{P})\times\mathtt{Cube}\!\,(\mathbf{Q})$ .

A.2.3. Example: representing a circle.

Similarly to the example of a bead on a straigh piece of string (Section A.2.1), one could consider a bead on a circular bracelet, replacing the interval $[0,1]$ with the unit circle $\SS^{1}\subset\mathbb{C}$ in the complex plane. This time, let $p_{0},\ldots,p_{L-1}\in\SS^{1}$ be a cyclically ordered collection of marker points, say, $p_{k}:=\exp(\tfrac{2\pi k\mathbf{i}}{L})$ .

We will compare several different representations over the PCSs:

[TABLE]

We regard $\mathbf{\Sigma}$ as a sensorium whose realization $\rho=\rho(L,\epsilon)$ is defined by setting $x\in\rho(a_{k})$ for a history $x$ if and only if the currect state $x_{0}$ lies in the open circular arc segment of $\SS^{1}$ centered at $p_{k}$ and having radius $\epsilon$ . Depending on the choice of $L$ and $\epsilon$ , different PCRs (and duals) may arise. Specifically, We consider the examples with $L=4$ and $\epsilon=\tfrac{\pi}{4},\tfrac{\pi}{3}$ ; with $L=6$ and $\epsilon=\tfrac{\pi}{3},\tfrac{\pi}{2}$ , to illustrate possible differences and shared qualities.

Jack Sparrow’s compass, $L=4$ .

Rather than keep track of the indices modulo 4 in this example, let us identify it with a day-to-day object: a compass. We denote

[TABLE]

Figure 10(left) depicts the subsets of $\SS^{1}$ which determine $\rho(a_{i})$ , $i=0,1,2,3$ , for the realizations in the cases $\epsilon=\tfrac{\pi}{3}$ (A) and $\epsilon=\tfrac{\pi}{4}$ (B). Thinking of $\SS^{1}$ as the space of all possible positions of a compass needle—the needle of this compass points in the direction of your heart’s greatest desire and that may not be a visit to the magnetic north pole—one should think of, e.g., $N_{\alpha}:=\rho_{\alpha}(\mathtt{n})$ , $\alpha\in\{A,B\}$ , as the set of positions of the needle with which observer $\alpha$ associates an affirmative answer to the question “Is the needle pointing North?”. The difference between the two examples is that $\rho_{B}(\mathtt{n}),\rho_{B}(\mathtt{s}),\rho_{B}(\mathtt{w}),\rho_{B}(\mathtt{e})$ are pairwise disjoint, while $\rho_{A}$ realizes the major directions so that only opposites are disjoint.

Let $G$ denote the poc set structure on $\mathbf{\Sigma}$ with relations of the form131313We regard the indices in this example and any arithmetic operations on them as being defined modulo $L$ .

[TABLE]

Then $\rho$ is a poc morphism for either choice of $\epsilon$ , and the right hand side of Figure 10 illustrates the perceptual classes of $\rho$ (yellow highlighting) together with the edges they induce in the ambient structure, $\mathtt{Dual}\!\left(G\right)$ . Note how case (A) produces an embedded cycle sub-graph in $\mathtt{Dual}\!\left(G\right)$ —a coarse but topologically faithful reconstruction of $\SS^{1}$ , which is homotopically non-trivial—while case (B) produces a tree, a space homotopically equivalent to a point.

While providing an illustration for Proposition Proposition 2.22, this example also highlights the necessity in discussing what properties of the realization map $\rho$ could guarantee a degree of fidelity of the observer’s reconstruction of the observed space (the space of histories $\mathbf{X}$ ? the ‘environment’ $\SS^{1}$ ?) as, say, the sub-graph of $\mathtt{Dual}\!\left(G\right)$ induced by the perceptual classes.

Higher dimensions, $L=6$ .

Figure 11 compares the dual graphs/cubings of two poc set representations, each optimal for its corresponding choice of the value of $\epsilon$ . The case $\epsilon=\tfrac{\pi}{3}$ again has the property that non-consecutive $\rho(a_{j})$ are disjoint, implying that $\rho$ is a poc isomorphism of the poc set structure

[TABLE]

onto its image in $\mathbf{2}^{\SS^{1}}$ . Denote this poc set structure on $\mathbf{\Sigma}$ by $\mathbf{P}_{1}$ .

The case $\epsilon=\tfrac{\pi}{2}$ has fewer nesting relations among the $\rho(a_{j})$ , because every three consecutive sets of this form have a point in common. Formally, $\rho$ is a poc isomorphism of the poc set structure described by:

[TABLE]

onto its image. Denote this poc set structure on $\mathbf{\Sigma}$ by $\mathbf{P}_{2}$ . Note that the identity map $\mathrm{id}\colon\mathbf{P}_{2}\to\mathbf{P}_{1}$ is a poc morphism, while its inverse is not: a poc morphism $f$ is allowed to map a transverse pair to a nested one, but not the other way around. The dual of this map embeds $\mathtt{Dual}\!\left(\mathbf{P}_{1}\right)$ in $\mathtt{Dual}\!\left(\mathbf{P}_{2}\right)$ . This embedding can be seen clearly in Figure 11(right).

A.2.4. Example: moving bead on an interval.

Returning to the ‘thresholds’ example of Section A.2.1, we would like to consider it from the point of view of the agents described in Section 5.4.1.

The interval $[0,1]$ from Section A.2.1 will now be replaced with $[0,L]$ , $L$ a positive integer, for convenience. Once again we are given position sensors $a_{1},\ldots,a_{L}\in\mathbf{\Sigma}$ —more sensors will be added to $\mathbf{\Sigma}$ in a moment—with realizations $x\in\rho(a_{j})\Leftrightarrow\mathtt{pos}(x)<j$ , where we recall that $x=(x_{t})_{t=-\infty}^{0}$ , $x_{t}\in[0,L]$ is our current notion of a history, and $\mathtt{pos}(x):=x_{0}$ is the current position of the bead on $[0,L]$ .

This time we are interested in reasoning about the possible motion of the bead along the interval, so we introduce delayed sensors $\sharp a_{j}$ into $\mathbf{\Sigma}$ alongside the original position sensors. Formally, the delay operator acts on histories via $(\sharp x)_{t}:=x_{t-1}$ , and acts on sensors via $x\in\rho(\sharp a)\Leftrightarrow\sharp x\in\rho(a)$ , that is: at any time, the current value of $\sharp a$ coincides with value of $a$ in the previous cycle.

The bead is endowed with two actuators. One, named $\mathtt{rt}$ , whose action at time $t$ pushes the bead one unit to the right along the interval. The only exception is the position $L$ : if $\mathtt{rt}$ is applied there, its contribution to the motion of the bead will be nil. Similarly, an actuator named $\mathtt{lt}$ pushes the bead one unit toward the endpoint [math] of $[0,L]$ , with no effect when the bead is already there. Finally, turning on both actuators at the same time results in no motion of the bead in either direction.

Two agents, also named $\mathtt{rt}$ and $\mathtt{lt}$ , are each in charge of deciding, respectively, whether to act (turn on their assigned actuator for the duration of one time interval), or not. Each agent $\alpha\in\{\mathtt{rt},\mathtt{lt}\}$ maintains two PCR representations: $G^{\alpha}$ is updated conditioned on the agent having acted, and $G^{\alpha{{}^{\scriptscriptstyle\ast}}}$ is updated conditioned on $\alpha$ resting.

Here we will consider the poc set representations we would like each agent to learn, as we attempt to draw their dual cubings. For this purpose, we analyze nesting relations in the sensorium

[TABLE]

In the absence of any additional assumptions, the following relations are consistent with the selected realization (and are, therefore, desirable as part of any learned poc set structure on $\mathbf{\Sigma}_{0}$ ): $a_{1}<a_{2}<\cdots<a_{L}$ encodes the geometry of the interval, and further implies also $\sharp a_{1}<\sharp a_{2}<\cdots<\sharp a_{L}$ ; not knowing anything about the actions taken by the actuators one may only be certain of the relations $(\ddagger)$ $\sharp a_{j}<a_{j+1}\,,\;a_{j}<\sharp a_{j+1}$ , for $j<L$ . Denote the resulting poc set structure on $\mathbf{\Sigma}_{0}$ by $\mathbf{P}_{0}$ .

Leveraging our understanding of cartesian products (Section A.2.2), we set $\mathbf{Q}$ to be the sub-poc set of $\mathbf{P}_{0}$ restricted to just the position sensors $a_{j}$ , while $\sharp\mathbf{Q}$ will be the sub-poc set of $\mathbf{P}_{0}$ over the delayed position sensors $\sharp a_{j}$ . Then the identity mapping $\mathtt{id}_{\colon}\mathbf{Q}\vee\mathbf{\sharp Q}\to\mathbf{P}_{0}$ is a poc morphism, whose dual map is a median-preserving embedding of cubical complexes, of $\mathtt{Cube}\!\,(\mathbf{P}_{0})$ in the square $(L+1)\times(L+1)$ grid arising as $\mathtt{Cube}\!\,(\mathbf{Q})\times\mathtt{Cube}\!\,(\mathbf{\sharp Q})$ . We conclude that $\mathtt{Cube}\!\,(\mathbf{P}_{0})$ is the cubical complex shown in Figure 12(left), by applying the relations $(\ddagger)$ to erase redundant squares from the grid.

Now suppose that all the correct relations have been (somehow) learned and represented in the collection of PCRs $G^{\mathtt{rt}}$ , $G^{\mathtt{rt}{{}^{\scriptscriptstyle\ast}}}$ , $G^{\mathtt{lt}}$ and $G^{\mathtt{lt}{{}^{\scriptscriptstyle\ast}}}$ . Because the synchronous application of $\mathtt{rt}$ and $\mathtt{lt}$ yields no motion, there is no way to discriminate between $G^{\mathtt{rt}}$ and $G^{\mathtt{lt}{{}^{\scriptscriptstyle\ast}}}$ , as well as between $G^{\mathtt{lt}}$ and $G^{\mathtt{rt}{{}^{\scriptscriptstyle\ast}}}$ . However, the former two poc set presentations will have obtained the relations $a_{j}<\sharp a_{j}$ in addition to those of $\mathbf{P}_{0}$ , causing the dual cubing to grow even smaller, as highlighted on the right-hand side of Figure 12 (of course, a symmetric situation arises for the other pair of indistinguishable representations).

To end this section we note that, by hard-wiring the actuators to never execute both $\mathtt{rt}$ and $\mathtt{lt}$ at the same time, it is possible to disambiguate the representations. In this regime, the (optimally learned) representations $G^{\mathtt{rt}{{}^{\scriptscriptstyle\ast}}}$ and $G^{\mathtt{lt}{{}^{\scriptscriptstyle\ast}}}$ will remain the same, while the PCRs $G^{\mathtt{rt}}$ and $G^{\mathtt{lt}}$ will each experience a collapse to a non-trivial canonical quotient: the PCR $G^{\mathtt{rt}}$ witnesses $\sharp a_{j}$ if and only if it witnesses $a_{j+1}$ (the diagonal vertices in Figure 12 are inconsistent given $\mathtt{rt}$ is active). The situation is symmetric (but not identical) for $G^{\mathtt{lt}}$ , and we obtain four distinct “world views” for each of the observers.

A.3. Homotopy type of the observed space.

The phenomenon witnessed by the examples of Section A.2.3 is very general, and brings to bear on the capabilities and limitations of knowledge representation using PCRs.

For a fixed PCS $\mathbf{\Sigma}$ , a fixed space $\mathbf{X}$ and PCS morphism $\rho:\mathbf{\Sigma}\to\mathbf{2}^{\mathbf{X}}$ , recall (Section 2.1.1) the subset $\mathbf{M}(\rho)$ of the Hamming cube $\mathbb{H}=\mathbb{H}(\mathbf{\Sigma})$ consisting of those models $u\in\mathbb{H}$ for which $\bigcap_{a\in u}\rho(a)$ is non-empty—the set of possible worlds with respect to $\rho$ . Let $\mathtt{Cube}\!\,(\rho)$ denote the cubical complex corresponding to the concept presentation of $\mathbf{M}(\rho)$ —the set of cubical faces of $\mathbb{H}$ all of whose vertices lie in $\mathbf{M}(\rho)$ .

The authors proved in [19] that, for sufficiently tame topological spaces $\mathbf{X}$ and PCS morphisms $\rho:\mathbf{\Sigma}\to\mathbf{2}^{\mathbf{X}}$ , the following holds:

Theorem A.9 (Recovery of Homotopy Type).

Suppose that, for every cube $C\in\mathtt{Cube}\!\,(\rho)$ , the set

[TABLE]

is contractible. Then $\mathtt{Cube}\!\,(\rho)$ is homotopy equivalent to $\mathbf{X}$ .∎

In other words, if the collection of queries available to the observer is sufficiently rich that obviously contractible subspaces of $\mathtt{Cube}\!\,(\rho)$ (cubes) are witnessed by contractible subspaces of $\mathbf{X}$ , then $\mathtt{Cube}\!\,(\rho)$ has, in the formal sense provided by algebraic topology, the same shape as the observed space $\mathbf{X}$ .

In particular, under the condition of the theorem, if $\mathbf{P}$ is a poc set structure on $\mathbf{\Sigma}$ and $\rho$ is a poc morphism, then the universality of representation by PCRs (Proposition 2.22) implies that $\mathtt{Cube}\!\,(\rho)\subseteq\mathtt{Cube}\!\,(\mathbf{P})$ , making $\mathtt{Cube}\!\,(\mathbf{P})$ into a minimal contractible model space for $\mathbf{P}$ housing a homotopy model of the observed space, and the discrepancy between the two is precisely the set of unobservable perceptual classes.

To illustrate the theorem, let us return to the examples of the preceding paragraph to observe that none of the phenomena we have encountered there had happened by accident. For any $L\geq 2$ , a choice of $0<\epsilon\leq\tfrac{\pi}{L}$ leads to $\mathtt{Cube}\!\,(\mathbf{P})$ being a tree (a ‘starfish’) containing the vertex $v=\{a_{j}{{}^{\scriptscriptstyle\ast}}\}_{j=0}^{L-1}$ . Since the set of points in $\SS^{1}$ witnessing this vertex is disconnected (see Figure 10), the hypothesis of the last theorem fails, making it possible for $\mathtt{Cube}\!\,(\rho)$ to be contractible, which is exactly what happened for $L=4$ , $\epsilon=\tfrac{\pi}{4}$ . At the same time, any choice of $\tfrac{\pi}{L}<\epsilon\leq\tfrac{\pi}{2}$ results in the hypothesis of the theorem being fulfilled, which is why, in the three other cases considered here, $\mathtt{Cube}\!\,(\rho)$ is homotopy-equivalent to the circle.

Finally, we would like to emphasize that—similarly to the examples considered above—neither the tameness assumptions on $\mathbf{X}$ and $\rho$ nor the hypothesis of the last theorem are excessive in standard Robotics settings. First, since the sensor values are often functions of merely the last few visited states, the realization map $\rho:\mathbf{\Sigma}\to\mathbf{2}^{\mathbf{X}}$ will often factor, up to sufficient approximation, through $\mathbf{2}^{E\times\cdots\times E}$ where $E$ is the configuration space of the robotic system (similarly to the role played by the circle and the interval in all the preceding examples). Second, $E$ is often a manifold, possibly with corners, or a cellular complex; in the absence of chaotic behavior, and provided sufficient sensing, it becomes possible to construct a sufficiently fine mesh of sensor values for “chopping up” the reduced history space $E\times\cdots\times E$ into small contractible regions as required by our theorem.

Appendix B Appendix: Basic Results about PCRs.

B.1. Proof of Proposition 2.17.

Suppose $G$ is non-degenerate. Take any $S\in G^{\circ}$ and any $a\in\mathbf{\Sigma}$ . One of the following holds:

•

$S\cup\{a\}$ is coherent, and hence $a\in S$ (by the maximality property of $S$ ) and $a\not\leq a{{}^{\scriptscriptstyle\ast}}$ ;

•

$S\cup\{a{{}^{\scriptscriptstyle\ast}}\}$ is coherent, in particular $a{{}^{\scriptscriptstyle\ast}}\in S$ and $a{{}^{\scriptscriptstyle\ast}}\not\leq a$ ;

•

Neither of the above.

In the third case there are two possibilities. Either $S$ is empty, in which case the statement is that neither $\{a\}$ nor $\{a{{}^{\scriptscriptstyle\ast}}\}$ are coherent, which means that both $a\leq a{{}^{\scriptscriptstyle\ast}}$ and $a{{}^{\scriptscriptstyle\ast}}\leq a$ hold, and putting $a$ inside $N(G)\cap N(G){{}^{\scriptscriptstyle\ast}}$ — a contradiction; or there exist $b,c\in S$ such that $a\leq b{{}^{\scriptscriptstyle\ast}}$ and $a{{}^{\scriptscriptstyle\ast}}\leq c{{}^{\scriptscriptstyle\ast}}$ . But then $c\leq b{{}^{\scriptscriptstyle\ast}}$ — a contradiction to $S$ being coherent. Thus we are left with $a\in S$ or $a{{}^{\scriptscriptstyle\ast}}\in S$ for each $a\in\mathbf{\Sigma}$ , as desired.

The second assertion trivially implies the third, and the third implying the first follows from the remark preceding Definition 2.16.∎

B.2. Proof of Proposition 2.18.

It is clear that $\chi$ is injective. Any $f\in\mathrm{Hom}_{\scriptscriptstyle{gc}}\!\left(G,\,\mathbf{2}\right)$ is a function of $\mathbf{\Sigma}$ to $\mathbf{2}$ , a two-point set, and is therefore characterized by the (possibly empty) set of points on which it obtains the value $\mathbf{1}$ .

Now let us verify that $\chi$ is well-defined, that is: that the set $S=f{{}^{\scriptscriptstyle-1}}(\mathbf{1})$ is a maximal coherent subset of $\mathbf{\Sigma}$ with respect to $G$ . Indeed, were $a,b\in S$ such that $a\leq b{{}^{\scriptscriptstyle\ast}}$ , this would force $\mathbf{1}=f(a)\leq f(b){{}^{\scriptscriptstyle\ast}}=\mathbf{0}$ — a contradiction.

Finally, we prove the surjectivity of $\chi$ . Given a maximal coherent set $S$ , Proposition 2.17 implies $S$ is a selection on $\mathbf{\Sigma}$ . This means that the function $f:\mathbf{\Sigma}\to\mathbf{2}$ defined by $f(a)=\mathbf{1}\Leftrightarrow a\in S$ satisfies the identity $f(a{{}^{\scriptscriptstyle\ast}})=f(a){{}^{\scriptscriptstyle\ast}}$ . We claim that $f$ is a PCR morphism. Since $\chi(f)=S$ , proving this claim will finish the proof of the current proposition.

Suppose $f$ is not a morphism. Then there is $ab\in G$ satisfying $f(a)\not\leq_{G}f(b)$ . In the current setting this is tantamount to $f(b)=\mathbf{0}$ and $f(a)=\mathbf{1}$ , or, equivalently, $f(b{{}^{\scriptscriptstyle\ast}})=f(a)=\mathbf{1}$ . In turn, this means $a,b{{}^{\scriptscriptstyle\ast}}\in S$ . However, $S$ is forward-closed (as is any maximal coherent set), so $a\in S$ and $ab\in G$ imply $b\in S$ . With $b{{}^{\scriptscriptstyle\ast}}\in S$ we obtain a contradiction. ∎

B.3. Proof of Proposition 2.22.

The proof extends a standard argument from Sageev-Roller duality theory. Given $\mathbf{X}$ and $\rho$ , pick any point $x\in\mathbf{X}$ . By definition, $\xi=\rho{{}^{\scriptscriptstyle\ast}}(x)$ belongs in $G^{\circ}$ if and only if no $a,b\in\xi$ satisfy $a\leq b{{}^{\scriptscriptstyle\ast}}$ in $G$ . Since $\rho$ is order-preserving, having $a\leq b{{}^{\scriptscriptstyle\ast}}$ for $a,b\in\xi$ would imply $\rho(a)\cap\rho(b)=\varnothing$ while $x\in\tilde{\rho}(a)\cap\tilde{\rho}(b)$ at the same time—contradiction. Thus, $\xi\in G^{\circ}$ for all choices of $x\in\mathbf{X}$ , proving the first assertion of the proposition. To verify the second one, consider the choice of $\mathbf{X}=G^{\circ}$ with $\rho:\mathbf{\Sigma}\to\mathbf{2}^{\mathbf{X}}$ given by $\rho(a)=\left\{U\in G^{\circ}\left|a\in U\right.\right\}$ . It is easily verified that $\rho$ is a morphism and that $\rho{{}^{\scriptscriptstyle\ast}}:\mathbf{X}\to G^{\circ}$ is the identity map (and hence surjective), finishing the proof.∎

B.4. Proof of Proposition 2.24.

Let $G$ be a fixed non-degenerate PCR over $\mathbf{\Sigma}$ . For every $a\in\mathbf{\Sigma}$ , recall $[a]_{G}=a\!\uparrow\cap a\!\downarrow$ , and recall the definition of $\pi=\pi_{G}$ :

[TABLE]

Here are a few natural observations:

•

For any $a\in\mathbf{\Sigma}$ , $[a]_{G}{{}^{\scriptscriptstyle\ast}}=[a{{}^{\scriptscriptstyle\ast}}]_{G}$ , where for any set $S\subseteq\mathbf{\Sigma}$ we remember that $S{{}^{\scriptscriptstyle\ast}}:=\left\{a{{}^{\scriptscriptstyle\ast}}\left|a\in S\right.\right\}$ .

•

Since $N(G)$ is backwards-closed, $N(G)$ is a union of strong components of $G$ : indeed, if $a\in N(G)$ then every $b\in[a]_{G}$ satisfies $b\leq a$ , which implies $b\in N(G)$ ; hence $[a]_{G}\subseteq N(G)$ .

•

Analogously for $N(G){{}^{\scriptscriptstyle\ast}}$ , since it is forward-closed.

This allows for the construction of a new PCR $\widehat{G}$ over the PCS $\widehat{\mathbf{\Sigma}}:=\left\{\pi(a)\left|a\in\mathbf{\Sigma}\right.\right\}$ , by setting $\widehat{G}=\left\{\pi_{G}(a)\pi_{G}(b)\left|a\leq_{G}b\right.\right\}$ . We claim that $\widehat{G}$ induces on $\widehat{\mathbf{\Sigma}}$ the structure of a poc set. For this it will suffice to show that $\widehat{G}$ is a non-degenerate PCR and a partial order.

First we show that $\widehat{\mathbf{\Sigma}}$ is a PCS. The identity $\pi(a{{}^{\scriptscriptstyle\ast}})=\pi(a){{}^{\scriptscriptstyle\ast}}$ yields $\pi(a){{}^{\scriptscriptstyle\ast}}{{}^{\scriptscriptstyle\ast}}=\pi(a{{}^{\scriptscriptstyle\ast}}{{}^{\scriptscriptstyle\ast}})=\pi(a)$ for all $a\in\mathbf{\Sigma}$ . Suppose some $a\in\mathbf{\Sigma}$ satisfied $\pi(a){{}^{\scriptscriptstyle\ast}}=\pi(a)$ . Since $G$ is non-degenerate, this means $a\notin N(G)$ and $a\notin N(G){{}^{\scriptscriptstyle\ast}}$ . But then $\pi(a)=[a]_{G}$ , at the same time, and the equality $[a{{}^{\scriptscriptstyle\ast}}]_{G}=[a]_{G}$ implies both $a\leq a{{}^{\scriptscriptstyle\ast}}$ and $a{{}^{\scriptscriptstyle\ast}}\leq a$ —contradicting non-degeneracy. We conclude that $\pi(a){{}^{\scriptscriptstyle\ast}}\neq\pi(a)$ for all $a\in\mathbf{\Sigma}$ , and, since $\pi$ is surjective, $\widehat{\mathbf{\Sigma}}$ is a PCS.

It is clear now that $\widehat{G}$ is a PCR, by construction. Suppose now that $A\in\widehat{\mathbf{\Sigma}}$ lay in $N(\widehat{G})$ . Then $A\leq A{{}^{\scriptscriptstyle\ast}}$ , and writing $A=[a]_{G}$ , $a\in\mathbf{\Sigma}$ we obtain $a\in N(G)$ , showing that $A=N(G)$ . Thus, $N(\widehat{G})$ is trivial, as desired.

$\widehat{G}$ is partially ordered by general considerations, so to conclude that $\widehat{G}$ is a poc set, it remains to verify that $N(G)$ is its minimum. Now, $\mathbf{0}\in N(G)$ and $\widehat{\mathbf{0}}=\pi(\mathbf{0})=N(G)\in\widehat{\mathbf{\Sigma}}$ imply that the edge $\widehat{\mathbf{0}}\pi(a)\in\widehat{G}$ for all $a\in\mathbf{\Sigma}$ . Since $\pi$ is surjective, $\widehat{\mathbf{0}}$ is the minimum element of $\widehat{\mathbf{\Sigma}}$ with respect to the new partial order.

Finally, let $\mathbf{P}$ be any poc set, and let $f:G\to\mathbf{P}$ be any PCR morphism. Then $f$ is constant on $\pi(a)$ for all $a\in\mathbf{\Sigma}$ , which defines the injective set map $\widehat{f}:\Gamma\to\mathbf{P}$ via $\widehat{f}([a])=f(a)$ . This map is a PCR morphism of complemented graphs by construction, and is, therefore, a morphism of $\widehat{G}$ into $\mathbf{P}$ . If $f^{\prime}:\widehat{G}\to\mathbf{P}$ is any poc morphism satisfying $f=f^{\prime}\circ\pi$ , then for any $a\in\mathbf{\Sigma}$ we have $f^{\prime}(\pi(a))=f(a)=\widehat{f}(\pi(a))$ . Since $\pi$ is surjective, $f^{\prime}$ coincides with $\widehat{f}$ .∎

B.5. Proof of Corollary 2.25.

Let $G$ be a PCR over a PCS $\mathbf{\Sigma}$ , and let $\pi_{G}:G\to\widehat{G}$ be the canonical quotient map. We apply Proposition 2.24 with $\mathbf{P}=\mathbf{2}$ .

For any morphism $\varphi:G\to\mathbf{2}$ there exists one and only one morphism $\widehat{\varphi}:\widehat{G}\to\mathbf{2}$ satisfying $\varphi=\widehat{\varphi}\circ\pi_{G}$ . Now, thinking of $\widehat{\varphi}$ as an element of $\widehat{G}^{\circ}$ , we may write, by Propositions 2.18 and 2.20, $\pi_{G}^{\circ}(\widehat{\varphi})=\widehat{\varphi}\circ\pi_{G}=\varphi$ .

On the other hand, for any $\psi:\widehat{G}\to\mathbf{2}$ , we may write $\widehat{\pi_{G}^{\circ}(\psi)}=\widehat{\psi\circ\pi_{G}}=\psi$ , the last inequality following from the uniqueness assertion of Proposition 2.24, applied to the morphism $\psi\circ\pi_{G}$ .

We conclude that the map $\mathrm{Hom}_{\scriptscriptstyle{cg}}\!\left(G,\,\mathbf{2}\right)\to\mathrm{Hom}_{\scriptscriptstyle{cg}}\!\left(\widehat{G},\,\mathbf{2}\right)$ defined by $\varphi\mapsto\widehat{\varphi}$ is an inverse of $\pi_{G}^{\circ}:\mathrm{Hom}_{\scriptscriptstyle{cg}}\!\left(\widehat{G},\,\mathbf{2}\right)\to\mathrm{Hom}_{\scriptscriptstyle{cg}}\!\left(G,\,\mathbf{2}\right)$ , as desired.∎

B.6. Proof of Corollary 2.26.

We apply Proposition 2.24 again, to the PCR $G$ , the poc set $\mathbf{P}=\widehat{H}$ and the morphism $g:=\pi_{H}\circ f:G\to\widehat{H}$ , to conclude there exists one and only one morphism $\widehat{g}:\widehat{G}\to\widehat{H}$ satisfying $g=\widehat{g}\circ\pi_{G}$ . Substituting $g=\pi_{H}\circ f$ we see that $\widehat{g}$ is the required morphism.∎

Appendix C Appendix: Convexity theory of PCR duals.

The purpose of this section is to provide a self-contained account of our results regarding coherent projection and the use of propagation for the computation of nearest-point projections in poc set duals. Throughout this section, $G$ will be a fixed non-degenerate PCR over a finite PCS $\mathbf{\Sigma}$ . Moreover, without loss of generality (through replacing $G$ with its canonical poc quotient $\widehat{G}$ ), we may assume $G$ is a poc set. Since $G$ is fixed, we will simplify notation by writing $(\leq)$ instead of $(\leq_{G})$ , ‘coherent’ instead of ‘ $G$ -coherent’, and so on, throughout this section.

C.1. Proof of Lemma 2.37:

Property (1) is a restatement of the fact that, if $G$ is non-degenerate, then every coherent subset of $\mathbf{\Sigma}$ is contained in a coherent complete $\ast$ -selection. Items (2,3) are straightforward from the definition.

For item (4), observe that, if $u\in\mathfrak{h}(a;G)$ and $a\leq b$ , then $b\in u$ as well: indeed, if $b\notin u$ , then $b{{}^{\scriptscriptstyle\ast}}\in u$ and $a\leq(b{{}^{\scriptscriptstyle\ast}}){{}^{\scriptscriptstyle\ast}}$ means $u$ is incoherent. We conclude that $a\leq b$ implies $\mathfrak{h}(a;G)\subseteq\mathfrak{h}(b;G)$ , from which (4) readily follows.

To prove item (5), we observe that $S=S\!\uparrow$ implies $S=\min(S)\!\uparrow$ , and then we apply (4).

Finally, one direction of (6) amounts to (3). To prove the converse, suppose $S_{1},S_{2}\in\mathbf{C}(G)$ are such that $(\dagger)$ $\mathfrak{h}(S_{1};G)\subseteq\mathfrak{h}(S_{2};G)$ , and suppose there exists $s\in S_{2}\smallsetminus S_{1}$ . If there is a $u\in\mathfrak{h}(S_{1};G)$ such that $s\in\min(u)$ , then its neighbor $v:=\left[u\right]_{{}_{s}}$ in $\mathtt{Dual}\!\left(G\right)$ (recall Lemma A.4) contains $S_{1}$ but not $S_{2}$ , contradicting $(\dagger)$ . We are left to prove that $\mathfrak{h}(S_{1};G)$ must contain such a $u$ .

Indeed, pick any $v\in\mathfrak{h}(S_{1};G)$ such that the number $N(v)$ of $a\in v$ satisfying $a<s$ is smallest possible. Since $s\in S_{2}$ , we have $s\in v$ , by $(\dagger)$ . If $s\notin\min(v)$ (otherwise we are done), then $N(v)>0$ and we may find an $a\in\min(v)$ with $a<s$ . Consider the vertex $\left[v\right]_{{}_{a}}$ : since $S_{1}=S_{1}\!\uparrow$ , we conclude that $a\notin S_{1}$ , so $S_{1}\subset\left[v\right]_{{}_{a}}$ and $s\in\left[v\right]_{{}_{a}}$ , with $N(\left[v\right]_{{}_{a}})=N(v)-1$ . This contradicts the choice of $v$ , and we are done.∎

C.2. Proof of Proposition 3.1:

Suppose $B\in G^{\circ}$ is such that $\mathbf{\Delta}\!\left(A,B\right)\leq\mathbf{\Delta}\!\left(A,u\right)$ for all $u\in G^{\circ}$ . We must show that $\mathtt{coh}(A)\subseteq B$ .

Suppose $a\in\mathtt{coh}(A)\smallsetminus B$ . Then $a{{}^{\scriptscriptstyle\ast}}\in B$ and there is an element $b\in\min(B)$ with $b\leq a{{}^{\scriptscriptstyle\ast}}$ . Note that $u:=\left[B\right]_{{}_{b}}$ is then also an element of $G^{\circ}$ , by Lemma A.4.

Now, if $b\in A$ , then $a\in A\!\uparrow{{}^{\scriptscriptstyle\ast}}$ , contradicting $a\in\mathtt{coh}(A)=A\!\uparrow\smallsetminus A\!\uparrow{{}^{\scriptscriptstyle\ast}}$ . Therefore, $b{{}^{\scriptscriptstyle\ast}}\in A$ (since $A$ is a complete $\ast$ -selection), but then $u$ satisfies:

[TABLE]

—a contradiction again. We conclude that $\mathtt{coh}(A)\subseteq B$ , as desired.∎

C.3. Proof of Proposition 3.2:

Recall that $A\subseteq A\!\uparrow$ , $A\!\uparrow\!\uparrow=A\!\uparrow$ and $A{{}^{\scriptscriptstyle\ast}}\!\downarrow=A\!\uparrow{{}^{\scriptscriptstyle\ast}}$ for all $A\subseteq\mathbf{\Sigma}$ . We check that $\mathtt{coh}(A)$ is coherent for all $A$ . For suppose that $b,c\in\mathtt{coh}(A)$ satisfy $b\leq c{{}^{\scriptscriptstyle\ast}}$ . Then $b\in A\!\uparrow$ implies $c{{}^{\scriptscriptstyle\ast}}\in A\!\uparrow\!\uparrow=A\!\uparrow$ , and therefore $c\in A\!\uparrow{{}^{\scriptscriptstyle\ast}}$ . But then $c$ cannot lie in $\mathtt{coh}(A)$ .

Next, we verify that $\mathtt{coh}(A)$ is forward-closed. It suffices to verify $\mathtt{coh}(A)\!\uparrow\subseteq\mathtt{coh}(A)$ . By definition we have $\mathtt{coh}(A)\subseteq A\!\uparrow$ , hence $\mathtt{coh}(A)\!\uparrow\subseteq A\!\uparrow\!\uparrow=A\!\uparrow$ , and it remains to check that no $b\in\mathtt{coh}(A)\!\uparrow$ belongs to $A\!\uparrow{{}^{\scriptscriptstyle\ast}}$ ; were there such a $b$ , there would have been $a\in\mathtt{coh}(A),\,c\in A$ with $a\leq b$ and $c\leq b{{}^{\scriptscriptstyle\ast}}$ , implying $a\leq c{{}^{\scriptscriptstyle\ast}}$ — a contradiction to $a\notin A{{}^{\scriptscriptstyle\ast}}\!\downarrow=A\!\uparrow{{}^{\scriptscriptstyle\ast}}$ . This proves (a).

Now let us calculate: $\mathtt{coh}(\mathtt{coh}(A))=\mathtt{coh}(A)\!\uparrow\smallsetminus\mathtt{coh}(A)\!\uparrow{{}^{\scriptscriptstyle\ast}}=\mathtt{coh}(A)\smallsetminus\mathtt{coh}(A){{}^{\scriptscriptstyle\ast}}=\mathtt{coh}(A)$ , the last equality due to $\mathtt{coh}(A)$ being coherent. At the same time, if $A$ itself is coherent then $\mathtt{coh}(A)=A\!\uparrow\supseteq A$ . Moreover, this shows $\mathtt{coh}(A)=A$ whenever $A\in\mathbf{C}(G)$ . Finally, if $A=\mathtt{coh}(A)$ then $A\in\mathbf{C}(G)$ because $\mathtt{coh}(A)$ always is.∎

C.4. Proof of Proposition 2.41:

The proof of the projection formula will require additional notions and results from [43], which we now recall.

C.4.1. Separators and Gates

Definition C.1.

For any $K,L\subseteq G^{\circ}$ , the set

[TABLE]

is called the separator of $K$ and $L$ in $G^{\circ}$ .∎

The inequality $\mathbf{\Delta}\!\left(u,v\right)\geq\left|\mathtt{sep}\!\left(K,L\right)\right|$ follows immediately for all $u\in K$ and $v\in L$ . This motivates:

Definition C.2.

Let $K,L\subseteq G^{\circ}$ . A gate for $K,L$ is a pair of points $u\in K$ , $v\in L$ such that $\mathbf{\Delta}\!\left(u,v\right)=\left|\mathtt{sep}\!\left(K,L\right)\right|$ .∎

The following result is well known in our setting:

Proposition C.3.

Let $K,L$ be non-empty convex subsets of a median graph and let $u\in K$ and $v\in L$ . Then $u,v$ form a gate for $K,L$ if and only if $\mathtt{proj}_{K}{v}=u$ and $\mathtt{proj}_{L}{u}=v$ . Moreover, the pair $K,L$ has a gate.∎

We will apply this proposition without proof. An important consequence for us is the following:

Lemma C.4.

Suppose $S\subset\mathbf{\Sigma}$ is coherent, and $K=\mathfrak{h}(S;G)$ . Then, for any $a\in\mathbf{\Sigma}$ , if $K\subseteq\mathfrak{h}(a;G)$ then there exists $s\in S$ such that $s\leq a$ .

Proof.

Let $u\in K$ and $v\in L:=\mathfrak{h}(a{{}^{\scriptscriptstyle\ast}};G)$ form a gate. Since $v\notin K$ , there exists $s\in S$ such that $v\in\mathfrak{h}(s{{}^{\scriptscriptstyle\ast}};G)$ .

Suppose there existed a $w\in L$ with $w\in\mathfrak{h}(s;G)$ , and consider $m=\mathrm{med}\!\left(u,v,w\right)$ . Then, $a{{}^{\scriptscriptstyle\ast}}\in v,w$ implies $a{{}^{\scriptscriptstyle\ast}}\in m$ , but the inequality

[TABLE]

implies $m=v$ , since $v=\mathtt{proj}_{L}{u}$ . On the other hand, $s\in u,w$ implies $s\in m$ —hence $s\in v$ , a contradiction.

We have shown that $L=\mathfrak{h}(a{{}^{\scriptscriptstyle\ast}};G)$ is contained in $\mathfrak{h}(s{{}^{\scriptscriptstyle\ast}};G)$ . Equivalently, $a{{}^{\scriptscriptstyle\ast}}\leq s{{}^{\scriptscriptstyle\ast}}$ , which is the same as $s\leq a$ .∎∎

Lemma C.5.

Suppose $K,L$ are non-empty convex subsets of $\mathtt{Dual}\!\left(G\right)$ . If $K\cap L\neq\varnothing$ , then $\mathtt{proj}_{K}{L}=\mathtt{proj}_{L}{K}=K\cap L$ .

Proof.

Clearly, if $v\in K\cap L$ then $\mathtt{proj}_{L}(v)=v$ , so $K\cap L\subset\mathtt{proj}_{L}{K}$ . For the reverse inclusion, suppose $v\in\mathtt{proj}_{L}{K}$ and write $v=\mathtt{proj}_{L}{u}$ , $u\in K$ . Pick any point $w\in K\cap L$ . Setting $m=\mathrm{med}\!\left(w,v,u\right)$ we note that $m\in L$ (because $w,v\in L$ ) and

[TABLE]

The uniqueness of projection forces $v=\mathtt{proj}_{L}{u}$ to coincide with $m$ . However, since $w,u\in K$ we also have $m\in K$ , showing $v\in K\cap L$ .∎∎

We are now ready for the proof of one more lemma.

C.4.2. Proof of Lemma 2.38.

Since $T\in\mathbf{C}(G)$ and $u$ is a complete $\ast$ -selection, we have $\mathtt{sep}\!\left(\mathfrak{h}(T),u\right)=u{{}^{\scriptscriptstyle\ast}}\cap T=T\smallsetminus u$ . Since $S\subseteq u$ , we have $T\smallsetminus u\subseteq T\smallsetminus S$ . Overall, this yields $\mathbf{\Delta}\!\left(u,\mathfrak{h}(T)\right)=\left|\mathtt{sep}\!\left(u,\mathfrak{h}(T)\right)\right|\leq\left|T\smallsetminus S\right|$ , as required.∎

C.4.3. Computing Nearest Point Projection Maps

We now offer an explicit construction of a geodesic path in $\mathtt{Dual}\!\left(G\right)$ emanating from a given vertex $u$ and terminating at its unique nearest point in a specified convex target set:

Proposition C.6.

Suppose $u\in G^{\circ}$ is a vertex. Let $T\subseteq\mathbf{\Sigma}$ be a coherent subset. Then the following algorithm constructs a shortest path in $\mathtt{Dual}\!\left(G\right)$ from $u$ to $K=\mathfrak{h}(T;G)$ :

(1)

Find an element $b\in T\smallsetminus u$ ; if no such element, stop and output $u$ . 2. (2)

Find an element $c\leq b{{}^{\scriptscriptstyle\ast}}$ with $c\in\min(u)$ ; 3. (3)

Replace $u$ by $\left[u\right]_{{}_{c}}$ and return to the first step.

Proof.

We have $u\in K$ if and only if $T\subset u$ , which provides the stopping condition for the algorithm. Now, if $u\notin K$ and $b\in T\smallsetminus u$ then for all $v\in K$ one has $v\in\mathfrak{h}(b;G)$ and $u\in\mathfrak{h}(b{{}^{\scriptscriptstyle\ast}};G)$ . Since $c\leq b{{}^{\scriptscriptstyle\ast}}$ , we have $u\in\mathfrak{h}(c;G)\subseteq\mathfrak{h}(b{{}^{\scriptscriptstyle\ast}};G)$ , implying $v\in\mathfrak{h}(c{{}^{\scriptscriptstyle\ast}};G)$ and $c\in u\smallsetminus v$ . As a result:

[TABLE]

Having reduced $\mathbf{\Delta}\!\left(u,v\right)$ by a unit for all $v\in K$ , we have reduced $\mathbf{\Delta}\!\left(u,K\right)$ by a unit as well.∎∎

Corollary C.7 (Projection of a Point).

Let $G$ and $T$ be as above. Then the closest point projection to $K=\mathfrak{h}(T;G)$ is given by the formula:

[TABLE]

Proof.

Note that the second equality follows from the DeMorgan rules and the fact that $T\!\uparrow\cap T{{}^{\scriptscriptstyle\ast}}\!\downarrow=\varnothing$ , since $T$ is coherent. We now prove the first equality.

Set $K=\mathfrak{h}(T;G)$ and proceed by induction on $\mathbf{\Delta}\!\left(u,K\right)$ . If $\mathbf{\Delta}\!\left(u,K\right)=0$ , then $u\in K$ and therefore $T\subset u$ . In addition, $u$ is coherent and we conclude $T{{}^{\scriptscriptstyle\ast}}\!\downarrow\cap u=\varnothing$ , leaving us with

[TABLE]

as desired. Now suppose $n:=\mathbf{\Delta}\!\left(u,K\right)>0$ . By the preceding proposition, there is $a\in T{{}^{\scriptscriptstyle\ast}}\!\downarrow\cap u$ such that $v:=\left[u\right]_{{}_{a}}\in G^{\circ}$ , $\mathbf{\Delta}\!\left(v,K\right)=n-1$ , and $\mathtt{proj}_{K}{u}=\mathtt{proj}_{K}{v}$ . We thus have:

[TABLE]

the last equality being due to $a\in T{{}^{\scriptscriptstyle\ast}}$ and $a{{}^{\scriptscriptstyle\ast}}\in T$ .∎∎

C.4.4. Projecting a Convex Set to a Convex Set

Proposition C.8.

Let $K,L$ be non-empty convex subsets of $\mathtt{Dual}\!\left(G\right)$ with $L=\mathfrak{h}(S;G)$ and $K=\mathfrak{h}(T;G)$ . Then:

[TABLE]

Proof.

Since $T$ is coherent, $T\!\uparrow$ and $T{{}^{\scriptscriptstyle\ast}}\!\downarrow=T\!\uparrow{{}^{\scriptscriptstyle\ast}}$ are disjoint. This allows us to write:

[TABLE]

and the second equality in Equation 50 follows from the identity $\mathfrak{h}(T;G)=\mathfrak{h}(T\!\uparrow;G)$ . Denote $R=S\!\uparrow\smallsetminus T\!\uparrow{{}^{\scriptscriptstyle\ast}}$ and $N=\mathfrak{h}(R;G)$ .

For every $u\in L=\mathfrak{h}(S;G)$ we have $S\!\uparrow\subset u$ , implying $\mathtt{proj}_{K}{u}$ contains $T\!\uparrow\cup R$ , by Corollary C.7. Thus, $\mathtt{proj}_{K}{L}\subset K\cap N$ , as required.

For the converse, observe that the case $K\cap L\neq\varnothing$ was already dealt with in Lemma C.5: if $K\cap L\neq\varnothing$ , then

[TABLE]

In particular, $S\!\uparrow\cup T\!\uparrow$ is coherent, and hence does not intersect $T{{}^{\scriptscriptstyle\ast}}\!\uparrow$ , and the formula Equation 50 holds.

Thus we may henceforth assume $K\cap L=\varnothing$ . Equivalently, $S\!\uparrow\cap T{{}^{\scriptscriptstyle\ast}}\!\downarrow\neq\varnothing$ . In fact, by Lemma C.4 we have $S\!\uparrow\cap T{{}^{\scriptscriptstyle\ast}}\!\downarrow=\mathtt{sep}\!\left(A,B\right)$ .

Starting with $v\in K\cap N$ we must show $v\in\mathtt{proj}_{K}{L}$ . Set $u=\mathtt{proj}_{L}{v}$ , $w=\mathtt{proj}_{K}{u}$ , and $m=\mathrm{med}\!\left(u,v,w\right)$ . Then $m\in K$ since $v,w\in K$ . Since $K\cap L=\varnothing$ , we have $\mathbf{\Delta}\!\left(u,v\right)>0$ and $\mathbf{\Delta}\!\left(u,w\right)>0$ . Consider the point $m$ : we have $m\in I(u,w)$ and $m\in K$ ; by the choice of $w$ , $m$ must equal $w$ and therefore $w\in I(u,v)$ . Thus, $w=\mathtt{proj}_{K}{u}\in I(u,v)$ and $u=\mathtt{proj}_{L}{w}$ . By Proposition C.3, the pair $u,w$ is a gate for $K,L$ and we have

[TABLE]

Consider an element $a\in v\smallsetminus u$ . If $\mathfrak{h}(a;G)\cap L\neq\varnothing$ , pick $u^{\prime}\in\mathfrak{h}(a;G)\cap L$ . Then $m=\mathrm{med}\!\left(u,v,u^{\prime}\right)$ will satisfy $m\in\mathfrak{h}(a;G)\cap L$ as well as

[TABLE]

Now, $\mathbf{\Delta}\!\left(u,m\right)>0$ since $u\in\mathfrak{h}(a{{}^{\scriptscriptstyle\ast}};G)$ and a contradiction to $u\mathtt{proj}_{L}{v}$ is obtained. Thus, $\mathfrak{h}(a;G)\cap L$ must be empty, which means $L\subseteq\mathfrak{h}(a{{}^{\scriptscriptstyle\ast}};G)$ . Applying Lemma C.4 we obtain $a{{}^{\scriptscriptstyle\ast}}\in S\!\uparrow$ .

Overall, we have shown that $v\smallsetminus u\subseteq S\!\uparrow{{}^{\scriptscriptstyle\ast}}$ . We will now verify that $v\smallsetminus w=\varnothing$ , finishing the proof. Indeed, were it not so, there would have been $h\in v\smallsetminus w$ . On one hand, $w\in I(u,v)$ implies $v\smallsetminus w\subset v\smallsetminus u$ , and hence $h{{}^{\scriptscriptstyle\ast}}\in S\!\uparrow$ . On the other hand, $h\notin w$ means $h{{}^{\scriptscriptstyle\ast}}\in w$ and therefore $h{{}^{\scriptscriptstyle\ast}}\notin\mathtt{sep}\!\left(L,K\right)=S\!\uparrow\cap T\!\uparrow{{}^{\scriptscriptstyle\ast}}$ , which forces $h{{}^{\scriptscriptstyle\ast}}\in R$ . Since $R\subset v$ (by choice of $v$ ), we have $h{{}^{\scriptscriptstyle\ast}}\in v$ , contradicting our choice of $h$ .∎∎

We will need the following technical corollary for the purposes of propagation:

Corollary C.9.

Let $S,T\subset P$ be subsets and suppose $S$ is coherent. Let $L=\mathfrak{h}(S;G)$ and $K=\mathfrak{h}(\mathtt{coh}_{T}(;)G)$ . Then:

[TABLE]

Proof.

Recall that $\mathtt{coh}(T)=T\!\uparrow\smallsetminus T\!\uparrow{{}^{\scriptscriptstyle\ast}}$ , and set $J=T\!\uparrow\cap T\!\uparrow{{}^{\scriptscriptstyle\ast}}$ , so that $T\!\uparrow=\mathtt{coh}(T)+J$ and $T\!\uparrow{{}^{\scriptscriptstyle\ast}}=\mathtt{coh}_{T}({{}^{\scriptscriptstyle\ast}})+J$ . Then,

[TABLE]

Since $\mathtt{coh}(T)\!\uparrow=\mathtt{coh}(T)$ , the last expression equals $\mathtt{proj}_{K}{L}$ , by the preceding proposition. The proof of the second equality is similar.∎∎

Appendix D Appendix: Qualitative Snapshots (proofs)

D.1. Proof of Lemma 4.7.

For all $a,b,c\in\mathbf{\Sigma}$ one has $\mathfrak{h}(ac{{}^{\scriptscriptstyle\ast}})=\mathfrak{h}(abc{{}^{\scriptscriptstyle\ast}})\cup\mathfrak{h}(ab{{}^{\scriptscriptstyle\ast}}c{{}^{\scriptscriptstyle\ast}})\subseteq\mathfrak{h}(bc{{}^{\scriptscriptstyle\ast}})\cup\mathfrak{h}(ab{{}^{\scriptscriptstyle\ast}})$ . Thus, either the minimum of $\kappa$ over $\mathfrak{h}(ac{{}^{\scriptscriptstyle\ast}})$ is attained at a point of $\mathfrak{h}(bc{{}^{\scriptscriptstyle\ast}})$ or it is attained at a point of $\mathfrak{h}(ab{{}^{\scriptscriptstyle\ast}})$ (or both). Therefore one has $\mathtt{w}^{\kappa}_{ac{{}^{\scriptscriptstyle\ast}}}\geq\mathtt{w}^{\kappa}_{ab{{}^{\scriptscriptstyle\ast}}}$ or $\mathtt{w}^{\kappa}_{ac{{}^{\scriptscriptstyle\ast}}}\geq\mathtt{w}^{\kappa}_{bc{{}^{\scriptscriptstyle\ast}}}$ , as required. ∎

D.2. Proof of Proposition 4.9.

Denote $G=\mathbf{Res}\!\left(\mathtt{w}_{{\scriptscriptstyle\bullet}};\delta\right)$ for the rest of this proof. First, we need to show that $ab\in G$ implies $b{{}^{\scriptscriptstyle\ast}}a{{}^{\scriptscriptstyle\ast}}\in G$ . This is baked into the definition, as $\mathtt{w}_{b{{}^{\scriptscriptstyle\ast}}a{{}^{\scriptscriptstyle\ast}}{{}^{\scriptscriptstyle\ast}}}=\mathtt{w}_{ab{{}^{\scriptscriptstyle\ast}}}$ . Also, $\mathbf{0}a\in G$ is satisfied because $\mathtt{w}_{\mathbf{0}a}=\infty$ . Finally, applying Lemma 4.7 we conclude that, for all $a,b\in\mathbf{\Sigma}$ one has $a\leq_{G}b\Rightarrow\mathtt{w}_{ab{{}^{\scriptscriptstyle\ast}}}>\delta$ when $\delta<\infty$ , and $a\leq_{G}b\Rightarrow\mathtt{w}_{ab{{}^{\scriptscriptstyle\ast}}}=\infty$ when $\delta=\infty$ . In particular, were $a\in N(G)\cap N(G){{}^{\scriptscriptstyle\ast}}$ , then $a\leq_{G}a{{}^{\scriptscriptstyle\ast}}$ would have implied $\mathtt{w}_{a}=\mathtt{w}_{aa{{}^{\scriptscriptstyle\ast}}{{}^{\scriptscriptstyle\ast}}}>\delta$ (or equals $\infty$ if $\delta=\infty$ ), while $a{{}^{\scriptscriptstyle\ast}}\leq_{G}a$ would have given $\mathtt{w}_{a{{}^{\scriptscriptstyle\ast}}}=\mathtt{w}_{a{{}^{\scriptscriptstyle\ast}}a{{}^{\scriptscriptstyle\ast}}}>\delta$ (or $\infty$ , respectively)). But that would have meant $\mathtt{w}_{\varnothing}\in(\delta,\infty]$ — a contradiction. ∎

D.3. Proof of Corollary 4.10.

With $r=\mathtt{w}_{ab}\geq\mathtt{w}_{\varnothing}$ , we consider the PCR $G=\mathbf{Res}\!\left(\mathtt{w}_{{\scriptscriptstyle\bullet}};r\right)$ , for which we have $p\leq_{G}q$ if and only if $\mathtt{w}_{pq{{}^{\scriptscriptstyle\ast}}}>r$ . By Proposition 4.9, $G$ is non-degenerate, hence there exists $u\in G^{\circ}\subseteq\mathbb{H}$ . We set $\nu=\delta_{\nu,r}$ . For any $p,q\in u$ , since $p\not\leq q{{}^{\scriptscriptstyle\ast}}$ , we must have $\mathtt{w}_{pq}\leq r=\mathtt{w}^{\nu}_{pq}=\mathtt{w}_{ab}$ . At the same time, if $\{p,q\}\nsubseteq u$ , then $\mathtt{w}_{pq}\leq\infty=\mathtt{w}^{\kappa}_{pq}$ again. Thus $u$ is the desired vertex of $\mathbb{H}$ . ∎

D.4. Proof of Proposition 4.11.

Sufficiency follows from Lemma 4.7 and the observations following Definitions 4.1 and 4.5. Now, suppose $\mathtt{w}_{{\scriptscriptstyle\bullet}}$ is a 2-ranking, and consider the set $\mathscr{K}$ of all rankings $\kappa$ satisfying $\mathtt{w}^{\kappa}_{ab}\geq\mathtt{w}_{ab}$ for all $a,b\in\mathbf{\Sigma}$ . By Example 4.4, the family $\mathscr{K}$ is closed under taking pointwise minima. Since $\mathbf{\Sigma}$ is finite, $\mathscr{K}$ must have a minimum element.

Let $\widehat{\mathtt{w}}$ be given by Equation 16. To prove that it coincides with the minimum of $\mathscr{K}$ it suffices to verify that (a) $\widehat{\mathtt{w}}\leq\kappa$ for all $\kappa\in\mathscr{K}$ , and that (b) $\widehat{\mathtt{w}}$ agrees with $\mathtt{w}_{{\scriptscriptstyle\bullet}}$ .

Fix $\kappa\in\mathscr{K}$ . Then, for any $u\in\mathbb{H}$ and $a,b\in u$ we have $\mathtt{w}^{\kappa}_{ab}\leq\kappa(u)$ because $\kappa$ is a ranking, and $\mathtt{w}_{ab}\leq\mathtt{w}^{\kappa}_{ab}$ by the particular choice of $\kappa$ , proving (a). Finally, to prove (b), it suffices to verify that, for every $a,b\in\mathbf{\Sigma}$ , there exists a ranking $\nu\in\mathscr{K}$ with $\mathtt{w}^{\nu}_{ab}=\mathtt{w}_{ab}$ . Indeed, Corollary 4.10 provides just such a ranking $\nu$ , setting $r=\mathtt{w}_{ab}$ , which finishes the proof. ∎

D.5. Proof of Proposition 4.16.

The first immediate observation regarding minsets is the following observation: An immediate result is this:

Lemma D.1.

Let $\delta,\epsilon\geq 0$ , and let $\mathtt{w}_{{\scriptscriptstyle\bullet}}$ be a 2-ranking. Let $G=\mathbf{Der}\!\left(\mathtt{w}_{{\scriptscriptstyle\bullet}};\delta\right)$ and $M=\mathtt{M}(\mathtt{w}_{{\scriptscriptstyle\bullet}};\epsilon)$ . Then $M\in\mathbf{C}(G)$ and $\mathfrak{h}(M;G)\neq\varnothing$ .∎

Proof.

Let $G=\mathbf{Der}\!\left(\mathtt{w}_{{\scriptscriptstyle\bullet}},\delta\right)$ and $M=\mathtt{M}(\mathtt{w}_{{\scriptscriptstyle\bullet}},\epsilon)$ for some $\delta,\epsilon\geq 0$ . Clearly, $M$ is a $\ast$ -selection. Now suppose that $a,b\in\mathbf{\Sigma}$ satisfy $ab\in G$ . If $a\in M$ , then $\mathtt{w}_{a}<\mathtt{w}_{a{{}^{\scriptscriptstyle\ast}}}-\epsilon$ and $\mathtt{w}_{ab{{}^{\scriptscriptstyle\ast}}}>\mathtt{w}_{ab},\mathtt{w}_{a{{}^{\scriptscriptstyle\ast}}b{{}^{\scriptscriptstyle\ast}}}$ . But $\mathtt{w}_{b{{}^{\scriptscriptstyle\ast}}}=\min\{\mathtt{w}_{ab{{}^{\scriptscriptstyle\ast}}},\mathtt{w}_{a{{}^{\scriptscriptstyle\ast}}b{{}^{\scriptscriptstyle\ast}}}\}$ then implies $\mathtt{w}_{b{{}^{\scriptscriptstyle\ast}}}=\mathtt{w}_{a{{}^{\scriptscriptstyle\ast}}b{{}^{\scriptscriptstyle\ast}}}$ , and hence also $\mathtt{w}_{a{{}^{\scriptscriptstyle\ast}}}\leq\mathtt{w}_{a{{}^{\scriptscriptstyle\ast}}b{{}^{\scriptscriptstyle\ast}}}=\mathtt{w}_{b{{}^{\scriptscriptstyle\ast}}}$ . Similarly, $\mathtt{w}_{a}=\min\{\mathtt{w}_{ab{{}^{\scriptscriptstyle\ast}}},\mathtt{w}_{ab}\}=\mathtt{w}_{ab}\geq\mathtt{w}_{b}$ , and we have $\mathtt{w}_{b}\leq\mathtt{w}_{a}<\mathtt{w}_{a{{}^{\scriptscriptstyle\ast}}}-\epsilon\leq\mathtt{w}_{b{{}^{\scriptscriptstyle\ast}}}-\epsilon$ , proving $b\in M$ , and we conclude that $M$ is a forward-closed $\ast$ -selection. In particular, it is $G$ -coherent.141414For, suppose $a,b\in M$ and we had $a\leq_{G}b{{}^{\scriptscriptstyle\ast}}$ ; then we would have also had $b{{}^{\scriptscriptstyle\ast}}\in M$ because $M$ is forward-closed, contradicting $M$ being a $\ast$ -selection.∎∎

To prove Proposition 4.16, we need to analyze the relationship between level sets of the rankings $\kappa$ and $\widehat{\kappa}$ . We have the following lemma:

Lemma D.2.

Let $\kappa$ be a ranking in $\mathbb{H}$ and fix a value $r\in\widehat{\mathds{N}}$ , $\mathtt{w}^{\kappa}_{\varnothing}\leq r<\infty$ . For the sub-level sets $F=[\kappa\leq r]$ and $\widehat{F}=[\widehat{\kappa}\leq r]$ one has: (a) $F\subseteq\widehat{F}$ , (b) $F^{\scriptscriptstyle{\sharp}}=\widehat{F}^{\scriptscriptstyle{\sharp}}$ , and (c) $\widehat{F}\subseteq\mathfrak{h}(F^{\scriptscriptstyle{\sharp}})$ .

Proof.

Since $\widehat{\kappa}\leq\kappa$ , we have $F\subseteq\widehat{F}$ , which, in turn, implies $\widehat{F}^{\scriptscriptstyle{\sharp}}\subseteq F^{\scriptscriptstyle{\sharp}}$ . Conversely, if $a\in F^{\scriptscriptstyle{\sharp}}$ , then $F\subseteq\mathfrak{h}(a)$ implies $\kappa(u)>r$ for all $u\in\mathfrak{h}(a{{}^{\scriptscriptstyle\ast}})$ ; equivalently, $\kappa(\mathfrak{h}(a{{}^{\scriptscriptstyle\ast}}))>r$ . Since this information carries over to the 2-restriction of $\kappa$ , we conclude that $\widehat{\kappa}(\mathfrak{h}(a{{}^{\scriptscriptstyle\ast}}))>r$ as well, which means $a\in\widehat{F}^{\scriptscriptstyle{\sharp}}$ , verifying (b). Assertion (c) follows directly from (b) via $\widehat{F}\subseteq\mathfrak{h}(\widehat{F}^{\scriptscriptstyle{\sharp}})=\mathfrak{h}(F^{\scriptscriptstyle{\sharp}})$ . ∎∎

One can say more about the lowest level sets of $\kappa$ :

Lemma D.3.

In the notation of Lemma D.2, let $G=\mathbf{Der}\!\left(\kappa;\delta\right)$ with $\delta\in\widehat{\mathds{N}}$ . Then:

(a)

If $r\leq\delta+\mathtt{w}^{\kappa}_{\varnothing}\in\widehat{\mathds{N}}$ , then $\widehat{F}\subseteq G^{\circ}$ ; 2. (b)

If $\delta=0$ , then $\mathfrak{h}(F^{\scriptscriptstyle{\sharp}})\cap G^{\circ}\subset\widehat{F}$ .

Proof.

To prove (a), consider an arbitrary $u\in\widehat{F}$ . Since $u$ is a complete $\ast$ -selection it suffices to show that it is forward-closed with respect to $G$ . Take any $x\in u$ and $y\in\mathbf{\Sigma}$ . If $xy\in G$ , then:

[TABLE]

However, if $y\notin u$ , then we have $\{x,y{{}^{\scriptscriptstyle\ast}}\}\subseteq u$ . Since $\widehat{\kappa}(u)\leq r$ , we must have $\mathtt{w}^{\kappa}_{xy{{}^{\scriptscriptstyle\ast}}}\leq r$ , by Equation 16. This, however, contradicts our assumption regarding $r$ .

To verify (b), suppose $u\in\mathfrak{h}(F^{\scriptscriptstyle{\sharp}})\cap G^{\circ}$ , but $\widehat{\kappa}(u)>r$ . Applying Equation 16 again, we conclude there is a pair $x,y\in u$ with $\mathtt{w}^{\kappa}_{xy}>r$ . In particular, no element of $F$ is contained in $\mathfrak{h}(xy)$ , which leads to the following two complementary cases:

•

**Both $\mathfrak{h}(xy{{}^{\scriptscriptstyle\ast}})$ and $\mathfrak{h}(x{{}^{\scriptscriptstyle\ast}}y)$ contain elements of $F$ . ** Then we have $\mathtt{w}^{\kappa}_{xy}>r\geq\mathtt{w}^{\kappa}_{xy{{}^{\scriptscriptstyle\ast}}},\mathtt{w}^{\kappa}_{x{{}^{\scriptscriptstyle\ast}}y}$ , which means $xy{{}^{\scriptscriptstyle\ast}}\in G$ and contradicts $u\in G^{\circ}$ .

•

**Either $F\cap\mathfrak{h}(x)=\varnothing$ or $F\cap\mathfrak{h}(y)=\varnothing$ . ** In other words, either $x{{}^{\scriptscriptstyle\ast}}\in F^{\scriptscriptstyle{\sharp}}$ or $y{{}^{\scriptscriptstyle\ast}}\in F^{\scriptscriptstyle{\sharp}}$ . Since $F^{\scriptscriptstyle{\sharp}}\subseteq u$ , we conclude that one of $x{{}^{\scriptscriptstyle\ast}},y{{}^{\scriptscriptstyle\ast}}$ must lie in $u$ . This contradicts the fact that $x,y\in u$ , since $u$ is a complete $\ast$ -selection.

Thus, either case yields a contradiction, finishing the proof. ∎∎

We are finally ready to prove Proposition 4.16.

Setting $r=\kappa(\mathbb{H})$ —the minimum value of $\kappa$ —we apply Lemma D.3(a) to conclude that $\widehat{F}\subseteq G^{\circ}$ . From Lemma D.3(b) and Lemma D.2(c) we obtain that $\mathfrak{h}(F^{\scriptscriptstyle{\sharp}};G)=\mathfrak{h}(F^{\scriptscriptstyle{\sharp}})\cap G^{\circ}$ coincides with $\widehat{F}$ . Since $F\subseteq\widehat{F}$ by Lemma D.2(a) we conclude that $F\subset G^{\circ}$ and we may apply Corollary 2.36 to deduce that $\widehat{F}$ is the convex hull of $F$ in $\mathtt{Dual}\!\left(G\right)$ . Finally, consider the set $M=\mathtt{M}(\kappa)$ . For any $a\in\mathbf{\Sigma}$ , at least one of $\kappa(\mathfrak{h}(a))$ , $\kappa(\mathfrak{h}(a{{}^{\scriptscriptstyle\ast}}))$ equals $\kappa(\mathbb{H})$ . Therefore, we have $a\in M$ if and only if $\kappa(\mathfrak{h}(a))=\kappa(\mathbb{H})$ and $\kappa(\mathfrak{h}(a{{}^{\scriptscriptstyle\ast}}))>\kappa(\mathbb{H})$ , if and only if $a\in F^{\scriptscriptstyle{\sharp}}$ for our current choice of $F=[\kappa\leq\kappa(\mathbb{H})]$ . This finishes the proof.∎

Appendix E Appendix: Real-Valued Snapshots (proofs).

E.1. Proof of Proposition 4.20.

We must show that the derived PCR $G=\mathbf{Der}\!\left(\mathtt{w}_{{\scriptscriptstyle\bullet}}\right)$ of a real-valued 2-weight $\mathtt{w}_{{\scriptscriptstyle\bullet}}$ satisfying the requirements 1.-5. of Definition 4.17 is non-degenerate. Recall Equation 25, defining $G$ given an assignment of thresholds $\tau_{{\scriptscriptstyle\bullet}}$ :

[TABLE]

Define functions $\omega,\partial:\mathbf{\Sigma}\times\mathbf{\Sigma}\to\mathds{R}$ via $\omega(ab):=\mathtt{w}_{a{{}^{\scriptscriptstyle\ast}}b}-\mathtt{w}_{ab{{}^{\scriptscriptstyle\ast}}}$ and $\partial(ab):=\mathtt{w}_{a{{}^{\scriptscriptstyle\ast}}b}+\mathtt{w}_{ab{{}^{\scriptscriptstyle\ast}}}$ . From properties 1. and 4. of the 2-weight $\mathtt{w}_{{\scriptscriptstyle\bullet}}$ , one has the identities $\omega(ab)=-\omega(ba)$ , $\omega(aa)=0$ and $\omega(ac)=\omega(ab)+\omega(bc)$ . From properties 2. and 5. of the 2-weight $\mathtt{w}_{{\scriptscriptstyle\bullet}}$ one also obtains the identities $\partial(ab)=\partial(ba)\geq 0$ , $\partial(aa)=0$ , $\partial(ac)\leq\partial(ab)+\partial(bc)$ , and $\partial(aa{{}^{\scriptscriptstyle\ast}})=\mathtt{w}_{\varnothing}$ .

We are ready to prove the proposition. Suppose $a\leq_{G}a{{}^{\scriptscriptstyle\ast}}\leq_{G}a$ for some $a\in\mathbf{\Sigma}$ , and find a sequence $a_{0},\ldots,a_{m},\ldots,a_{n}$ with $a_{0}=a$ , $a_{m}=a{{}^{\scriptscriptstyle\ast}}$ , $a_{n}=a$ and $a_{k-1}a_{k}\in G$ for $k=1,\ldots,n$ . We then must have $\omega(aa)=\sum_{k=1}^{n}\omega(a_{k-1}a_{k})\geq 0$ , with equality if and only if $\omega(a_{k-1}a_{k})=0$ for all $k=1,\ldots,n$ . By the definition of $G$ , this implies $\mathtt{w}_{a_{k-1}{{}^{\scriptscriptstyle\ast}}a_{k}}=\mathtt{w}_{a_{k-1}a_{k}{{}^{\scriptscriptstyle\ast}}}=0$ for all $k=1,\ldots,n$ .

But then we also have $\mathtt{w}_{\varnothing}=\partial(aa{{}^{\scriptscriptstyle\ast}})\leq\sum_{k=1}^{m}\partial(a_{k-1}a_{k})=0$ , which is only possible when $\mathtt{w}_{{\scriptscriptstyle\bullet}}$ is trivial—a contradiction.∎

E.2. Proof of Proposition 4.21.

The proof of this proposition follows a standard scheme, widely attributed to Chernoff, and is only included here for the sake of completeness.

Recall that the Kullback-Leibler divergence of a $\mathbf{Ber}(q)$ random variable relative to a $\mathbf{Ber}(p)$ random variable is given by151515We will always mean the natural base, $\mathtt{e}$ , of the logarithm when using the notation $\log({\scriptscriptstyle\bullet})$ .:

[TABLE]

We require the following standard lemma:

Lemma E.1 (KL-divergence bound).

Let $p,q\in(0,1)$ , and consider the function $f(\zeta):=\mathtt{e}^{-q\zeta}\left(1-p+p\mathtt{e}^{\zeta}\right)$ over the interval $(0,\infty)$ . Then

[TABLE]

Proof.

Differentiating $f$ one obtains:

[TABLE]

The function $f$ has only one critical point:

[TABLE]

and the value of $f$ at $\zeta_{0}$ is the claimed value, $f\left(\zeta_{0}\right)=\mathtt{e}^{-\mathbf{D}_{{}_{KL}}\!\left(q\big{\|}p\right)}$ .

Finally, $\zeta_{0}>0$ if and only if $q(1-p)>p(1-q)$ , which is tantamount to $q>p$ . ∎∎

The setting for learning snapshot weights described in Proposition 4.21 simplifies to the following. Suppose $\mathrm{X}\in[0,A]$ is a non-constant random variable, and let $\alpha:=\tfrac{\mathbb{E}\left[\mathrm{X}\right]}{A}$ . We posit a sequence $\mathrm{X}\big{|}_{\scriptscriptstyle{t}}$ , $t\geq 0$ of i.i.d. random variables $\mathrm{X}\big{|}_{\scriptscriptstyle{t}}\sim\mathrm{X}$ .

We have the following probability bounds:

Lemma E.2.

Let $t$ be a non-negative integer and let $\delta>0$ . Then, for $\mathrm{Y}:=\tfrac{1}{t+1}\sum_{s=0}^{t}\mathrm{X}\big{|}_{\scriptscriptstyle{s}}$ one has:

[TABLE]

where $\beta=\alpha+\tfrac{\delta}{A}$ and $\gamma:=\alpha-\tfrac{\delta}{A}$ .∎

To prove Proposition 4.21, observe that the first bound—the standard Chernoff bound—guarantees exponentially fast convergence in probability of the empirical snapshot weights $\mathtt{w}_{ab}\big{|}_{\scriptscriptstyle{t}}$ to the mean value of the signal over the domain $\rho(a)\cap\rho(b)$ when we take $\mathrm{X}\big{|}_{\scriptscriptstyle{t}}:=\varphi\big{|}_{\scriptscriptstyle{t}}\cdot\delta_{u_{t}}(\mathfrak{h}(ab))$ .

We are left to verify the Chernoff bound.

Proof of Lemma E.2:

Recall $\alpha:=\tfrac{\mathbb{E}\left[\mathrm{X}\right]}{A}$ , and observe that $0<\alpha<1$ because $\mathrm{X}$ is non-constant. We proceed in the standard way to obtain a bound for the empirical estimate of the sample mean. For every fixed value of $t$ , and recalling $\beta=\alpha+\tfrac{\delta}{A}$ one has:

[TABLE]

using the inequality $(\dagger)\;\mathtt{e}^{x\lambda}\leq 1-x+x\mathtt{e}^{\lambda}$ for $x\in[0,1]$ , $\lambda>0$ . Finally, this yields:

[TABLE]

Using Lemma E.1 to minimize the right hand side over $\zeta>0$ , we obtain, for every fixed $t$ :

[TABLE]

as claimed. Now replace $\mathrm{X}$ with $A-\mathrm{X}$ , $\mathrm{Y}$ with $A-\mathrm{Y}$ , and recall $\gamma:=\alpha-\tfrac{\delta}{A}$ . We obtain:

[TABLE]

finishing the proof.∎

Appendix F Appendix: Debugging Sniffy on the Circle.

The purpose of this appendix is to explain in some detail the reasons for Sniffy’s behavior on the circle in its qualitative BUA incarnation, as discussed in Section 5.4.2. We proceed in a manner similar to the discussion of an agent on the interval from Section A.2.4.

F.1. Sensors and Relations.

Let $a_{k}$ denote the sensor centered at $k\in\mathbf{E}=\{0,\ldots,19\}$ , reporting $1$ at time $t$ if and only if $\mathtt{dist}\!\left(k,\mathtt{pos}(t)\right)\leq 4$ , where $\mathtt{pos}(t)$ is the position occupied by Sniffy at time $t$ . It will be convenient to think of $\mathbf{E}$ as a copy of the additive group $ZZ_{20}$ , keeping in mind its action on subsets $S\subseteq\mathbf{E}$ given by $k+S:=\{i-k\,|\,i\in S\}$ .

It is then easy to verify that $a_{k}<a{{}^{\scriptscriptstyle\ast}}_{k+\{9,10,11\}}$ are the only relations among the $a_{k}$ . Consequently, the analogous relations $\sharp a_{k}<\sharp a{{}^{\scriptscriptstyle\ast}}_{k+\{9,10,11\}}$ must also hold for all $k$ (see Section A.2.3 for details). Motion is described by the conditional relations $\mathtt{rt}:\sharp a_{k}<a_{k+1}$ and $\mathtt{lt}:\sharp a_{k}<a_{k-1}$ , leading to the unconditional implications $\sharp a_{k}<a_{k+10}$ (compare with the case of the interval discussed in Section A.2.4).

Finally, observing that the entire setting is rotation-invariant, without loss of generality we may assume for the rest of this section that Sniffy’s target is located at position $T=0$ . Setting $M:=\{a_{0},a{{}^{\scriptscriptstyle\ast}}_{10}\}\cup\{a_{\pm 1},\ldots,a_{\pm 4}\}\cup\{a_{\pm 5},\ldots,a_{\pm 9}\}{{}^{\scriptscriptstyle\ast}}$ , the eventual target sets (minsets) determined by the individual snapshots are:

[TABLE]

where, due to the hard-wired arbitration enforcing $\neg(\mathtt{rt}\wedge\mathtt{lt})$ at all times, the derived PCR $G^{\mathtt{rt}}$ of the $\mathtt{rt}$ snapshot identifies each $a_{k}$ with $\sharp a_{k-1}$ ; and, similarly, $G^{\mathtt{lt}}$ identifies each $a_{k}$ with $\sharp a_{k+1}$ .

Finally, note that given $\mathtt{rt}{{}^{\scriptscriptstyle\ast}}$ it is impossible to witness $a_{k+9}\wedge\sharp a_{k}$ or $a_{k+10}\wedge\sharp a_{k}$ (while $a_{k+11}=a_{k-9}$ is still possible in conjunction with $\sharp a_{k}$ , if $\mathtt{lt}$ is active). We conclude that the relations $\sharp a_{k}<a{{}^{\scriptscriptstyle\ast}}_{k+9},a{{}^{\scriptscriptstyle\ast}}_{k+10}$ hold in $G^{\mathtt{rt}{{}^{\scriptscriptstyle\ast}}}$ for every $k$ , as do their analogous counterparts in $G^{\mathtt{lt}{{}^{\scriptscriptstyle\ast}}}$ .

F.2. The “dull peak” value signal.

The weights recorded on any snapshot in this case are $\{0,1,\infty\}$ , implying that (1) any implications appearing in Sniffy’s four snapshots at any time are a subset of the implications listed in the preceding section; and (2) any raw observation generated by Sniffy is coherent for any of its snapshots.

Let $k\in\mathbf{E}$ . Each snapshot forms its prediction for the next state by propagating $\sharp(-k+M)$ , and giving rise to:

[TABLE]

In order to compute the divergences from the targets, we first note that, for $k,\ell\in\mathbf{E}$ :

[TABLE]

For the $\mathtt{lt}$ snapshot this results in:

[TABLE]

By symmetry, we conclude that the $\mathtt{rt}$ snapshot has:

[TABLE]

Now, for the $\mathtt{lt}{{}^{\scriptscriptstyle\ast}}$ snapshot we have:

[TABLE]

where $\delta(k)=2$ for $-9\leq k=0$ , $\delta(k)=0$ for $0<k\leq 9$ , $\delta(k)=1$ for $k=10$ . By symmetry:

[TABLE]

Armed with these formulae, we go over the possibilities and conclude:

•

The BUA $\mathtt{lt}$ is active if and only if $k\in T+\{1,\ldots,11\}(\mathrm{mod}20)$ ;

•

The BUA $\mathtt{rt}$ is active if and only if $k\in T-\{1,\ldots,11\}(\mathrm{mod}20)$ .

In particular, $k\in T\pm\{0,\ldots,8\}$ is a basin of attraction for the target position, $T$ , while $k\in T+\{9,10,11\}$ is a region where both $\mathtt{lt}$ and $\mathtt{rt}$ seek to be active, triggering the hard-wired arbitration mechanism.

F.3. The “sharp peak” value signal.

Despite the value signal being more informative than that of its $\{0,1\}$ -valued “dull-peak” counterpart, a “sharp peak” qualitative agent’s performance on the target-finding task is clearly worse (Figure 7).

In a nutshell, the reason for this deficiency is that the limiting PCR satisfies the additional relations $a_{\pm 9}<a_{\pm 8}<\ldots<a_{\pm 1}$ , which we verified by hand. As a result, properties (1) and (2) stated in Section F.2 for the “dull peak” setting will not hold in this one. In fact, just these extra relations (there may be others) suffice for the current state representation of any point other than $k=0,10$ in any of Sniffy’s snapshots to degenerate (through coherent projection) into a less and less complete $\ast$ -selection as Sniffy’s physical distance from the target (in the environment) increases, while the quality of prediction deteriorating accordingly.

Specifically, any $k\in B:=10\pm\{0,1,2,3\}$ yields a raw observation containing $a_{10},a{{}^{\scriptscriptstyle\ast}}_{0}$ and both of $a_{9}$ and $a_{11}$ . On one hand, $a_{9},\ldots,a_{1}$ and $a_{11},\ldots,a_{19}$ are directed paths in each of Sniffy’s PCRs. On the other hand, though, so are $a_{k},a{{}^{\scriptscriptstyle\ast}}_{k+10}$ . Thus, (1) the coherent projection of the raw observation generated by $k$ is merely $\{a_{10},a{{}^{\scriptscriptstyle\ast}}_{0}\}$ , showing that Sniffy is unable to distinguish among the points of $B$ ; and (2) if $k$ is moved closer to the target, fewer conflicts of the above form will affect coherent projection.

In total, the observations above suffice for explaining the most visible differences (see Figure 7) between the behaviors of the two variants of Sniffy in the qualitative learning regime.

Bibliography50

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Ian Agol, Daniel Groves, and Jason Manning. The virtual Haken conjecture . Doc. Math , 18:1045–1087, 2013.
2[2] Carlos E Alchourrón, Peter Gärdenfors, and David Makinson. On the logic of theory change: Partial meet contraction and revision functions. The journal of symbolic logic , 50(2):510–530, 1985.
3[3] Carlos E Alchourrón and David Makinson. On the logic of theory change: Contraction functions and their associated revision functions. Theoria , 48(1):14–37, 1982.
4[4] Carlos E Alchourrón and David Makinson. On the logic of theory change: Safe contraction. Studia logica , 44(4):405–422, 1985.
5[5] Hans-J. Bandelt and Jarmila Hedlíková. Median algebras . Discrete Math. , 45(1):1–30, 1983. · doi ↗
6[6] Craig Boutilier. A unified model of qualitative belief change: A dynamical systems perspective. Artificial Intelligence , 98(1):281–316, 1998.
7[7] Indira Chatterji, Cornelia Druţu, and Frédéric Haglund. Kazhdan and Haagerup properties from the median viewpoint . Adv. Math. , 225(2):882–921, 2010. · doi ↗
8[8] Victor Chepoi. Graphs of some CAT ( 0 ) CAT 0 {\rm CAT}(0) complexes . Adv. in Appl. Math. , 24(2):125–179, 2000. · doi ↗

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Iterated Belief Revision Under Resource Constraints: Logic as Geometry

Abstract.

1. Introduction

1.1. Motivation.

1.2. Contributions: Introduction and Analysis of UMAs.

Universality of Representation.

Computational Complexity.

Multiple Learning Paradigms.

Flexibility of Representation.

1.3. Related Work.

1.4. Structure of this Paper.

2. Model Spaces for Systems of Approximate Implications.

2.1. Pointed Complemented Relations (PCR).

Definition 2.1** (pointed complemented set, PCS).**

Definition 2.2** (PCS morphism).**

Example 2.3** (set families, power sets).**

Example 2.4** (PCS over an alphabet).**

Definition 2.5** (∗\ast∗-selection, the Hamming cube).**

2.1.1. Binary Sensing, Possible Worlds and Perceptual Classes.

2.1.2. Concept Presentation of Perceptual States.

2.1.3. PCRs, Implications and Coherence.

Definition 2.6** (pointed complemented relation, PCR).**

Definition 2.7**.**

Definition 2.8** (PCR morphism).**

Example 2.9** (Set Families as PCRs).**

Example 2.10** (Less classical PCRs).**

Definition 2.11**.**

2.2. Model Spaces as Dual Spaces

Definition 2.12** (duals).**

Example 2.13** (the orthogonal PCR and the Hamming cube).**

Example 2.14** (‘bad’ queries).**

Definition 2.15**.**

Definition 2.16** (negligible query, degenerate graph).**

Proposition 2.17**.**

Proof.

Proposition 2.18**.**

Proof.

Remark 2.19**.**

Definition 2.20**.**

Example 2.21**.**

Proposition 2.22** (Universality of Representation).**

Proof.

2.3. Reducing PCR Representations.

Definition 2.23** (poc set).**

Proposition 2.24** (canonical quotient).**

Proof.

Corollary 2.25** (all duals are poc set duals).**

Proof.

Corollary 2.26** (naturality of canonical quotients).**

Proof.

2.4. Convexity theory of PCR duals.

Definition 2.27** (Hamming metric).**

Definition 2.28** (convexity in graphs).**

Lemma 2.29**.**

Lemma 2.30**.**

Definition 2.31**.**

Median Graphs.

Definition 2.32**.**

Theorem 2.33**.**

Theorem 2.34** (Properties of median graphs, [43], section 2).**

Convex hulls.

Definition 2.35**.**

Corollary 2.36**.**

Lemma 2.37**.**

Proof.

Lemma 2.38**.**

Proof.

Definition 2.39**.**

2.5. Propagation: A Computational Workhorse.

Definition 2.40** (coherent projection).**

Proposition 2.41**.**

Proof.

Definition 2.42**.**

Definition 2.1 (pointed complemented set, PCS).

Definition 2.2 (PCS morphism).

Example 2.3 (set families, power sets).

Example 2.4 (PCS over an alphabet).

Definition 2.5 ( $\ast$ -selection, the Hamming cube).

Definition 2.6 (pointed complemented relation, PCR).

Definition 2.7.

Definition 2.8 (PCR morphism).

Example 2.9 (Set Families as PCRs).

Example 2.10 (Less classical PCRs).

Definition 2.11.

Definition 2.12 (duals).

Example 2.13 (the orthogonal PCR and the Hamming cube).

Example 2.14 (‘bad’ queries).

Definition 2.15.

Definition 2.16 (negligible query, degenerate graph).

Proposition 2.17.

Proposition 2.18.

Remark 2.19.

Definition 2.20.

Example 2.21.

Proposition 2.22 (Universality of Representation).

Definition 2.23 (poc set).

Proposition 2.24 (canonical quotient).

Corollary 2.25 (all duals are poc set duals).

Corollary 2.26 (naturality of canonical quotients).

Definition 2.27 (Hamming metric).

Definition 2.28 (convexity in graphs).

Lemma 2.29.

Lemma 2.30.

Definition 2.31.

Definition 2.32.

Theorem 2.33.

Theorem 2.34 (Properties of median graphs, [43], section 2).

Definition 2.35.

Corollary 2.36.

Lemma 2.37.

Lemma 2.38.

Definition 2.39.

Definition 2.40 (coherent projection).

Proposition 2.41.

Definition 2.42.

Definition 2.43.

Proposition 3.1 (Coherent Approximation).

Proposition 3.2 (Coherent Projection).

Definition 4.1.

Remark 4.2.

Example 4.3 (point-mass ranking).

Example 4.4 (pointwise minimum).

Definition 4.5.

Remark 4.6.

Lemma 4.7 (triangle inequality).

Definition 4.8.

Proposition 4.9.

Corollary 4.10.

Proposition 4.11.

Definition 4.12.

Proposition 4.13.

Definition 4.14.

Definition 4.15.

Proposition 4.16.

Definition 4.17.

Example 4.18.

Example 4.19 (point mass weight).

Proposition 4.20.

Proposition 4.21 (PAC learning in empirical snapshots).

Proposition 4.22.