Semantic expressive capacity with bounded memory

Antoine Venant; Alexander Koller

arXiv:1906.11752·cs.CL·June 28, 2019

Semantic expressive capacity with bounded memory

Antoine Venant, Alexander Koller

PDF

TL;DR

This paper explores the limits of semantic parsing mechanisms, showing that projective systems require unbounded memory to represent certain relations, unlike nonprojective systems, impacting both grammar-based and neural models.

Contribution

It provides the first proof demonstrating the memory requirements for projective semantic parsing mechanisms, highlighting fundamental differences from nonprojective systems.

Findings

01

Projective mechanisms need unbounded memory for certain relations

02

Nonprojective mechanisms can represent these relations without unbounded memory

03

Implications for the design of grammar-based and neural semantic parsers

Abstract

We investigate the capacity of mechanisms for compositional semantic parsing to describe relations between sentences and semantic representations. We prove that in order to represent certain relations, mechanisms which are syntactically projective must be able to remember an unbounded number of locations in the semantic representations, where nonprojective mechanisms need not. This is the first result of this kind, and has consequences both for grammar-based and for neural systems.

Equations40

R E L (G, h, A) = {(yd (t), [[h (t)]]_{A}) ∣ t \in L (G)}

R E L (G, h, A) = {(yd (t), [[h (t)]]_{A}) ∣ t \in L (G)}

\mathcal{L}({\mathbb{G}},\mathcal{A})=\{\mathcal{R}\mathcal{E}\mathcal{L}({\mathcal{G}},h,\mathcal{A})\mid{\mathcal{G}}\in{\mathbb{G}},\mbox{$h$ LTH }\}.

\mathcal{L}({\mathbb{G}},\mathcal{A})=\{\mathcal{R}\mathcal{E}\mathcal{L}({\mathcal{G}},h,\mathcal{A})\mid{\mathcal{G}}\in{\mathbb{G}},\mbox{$h$ LTH }\}.

h (\mbox l \overset{o}{¨} n d) =

h (\mbox l \overset{o}{¨} n d) =

\displaystyle\;||\;\mathsf{f}_{\mathsf{o}}(\langle\mbox{$\mathsf{rt}$}\rangle\xrightarrow{\mbox{ARG1}}\langle\mbox{$\mathsf{o}$}\rangle\;||\;\mathsf{ren}_{\mathsf{rt}\rightarrow\mathsf{o}}(G_{(f)}))

\displaystyle\;||\;\mathsf{f}_{\mathsf{o}}(\langle\mbox{$\mathsf{rt}$}\rangle\xrightarrow{\mbox{ARG2}}\langle\mbox{$\mathsf{o}$}\rangle\;||\;\mathsf{ren}_{\mathsf{rt}\rightarrow\mathsf{o}}(G_{(g)}))

A = {a ⟨^{k} \overline{a}^{k} ⟩^{k} ∣ k \geq 0},

A = {a ⟨^{k} \overline{a}^{k} ⟩^{k} ∣ k \geq 0},

\begin{array}[]{rclrcl}h(*_{1})&=&\mathsf{f}_{\mathsf{s}}(x_{1}\;||\;\mathsf{ren}_{\mathsf{rt}\rightarrow\mathsf{s}}(x_{2}))&h(*_{0})&=&x_{1}\\ h(b)&=&d\xleftarrow{\mbox{}}b\langle\mbox{$\mathsf{rt}$}\rangle\xrightarrow{\mbox{}}\langle\mbox{$\mathsf{s}$}\rangle&h(b!)&=&d\xleftarrow{\mbox{}}b\langle\mbox{$\mathsf{rt}$}\rangle\\ h(a)&=&\lx@intercol c\xleftarrow{\mbox{}}a\langle\mbox{$\mathsf{rt}$}\rangle\xrightarrow{\mbox{}}\langle\mbox{$\mathsf{s}$}\rangle\hfil\lx@intercol\end{array}

\begin{array}[]{rclrcl}h(*_{1})&=&\mathsf{f}_{\mathsf{s}}(x_{1}\;||\;\mathsf{ren}_{\mathsf{rt}\rightarrow\mathsf{s}}(x_{2}))&h(*_{0})&=&x_{1}\\ h(b)&=&d\xleftarrow{\mbox{}}b\langle\mbox{$\mathsf{rt}$}\rangle\xrightarrow{\mbox{}}\langle\mbox{$\mathsf{s}$}\rangle&h(b!)&=&d\xleftarrow{\mbox{}}b\langle\mbox{$\mathsf{rt}$}\rangle\\ h(a)&=&\lx@intercol c\xleftarrow{\mbox{}}a\langle\mbox{$\mathsf{rt}$}\rangle\xrightarrow{\mbox{}}\langle\mbox{$\mathsf{s}$}\rangle\hfil\lx@intercol\end{array}

e_{\overline{y}}^{t^{'}} \geq n_{\overline{y}}^{t} - n_{x}^{t} l - m_{\overline{y}}^{t}

e_{\overline{y}}^{t^{'}} \geq n_{\overline{y}}^{t} - n_{x}^{t} l - m_{\overline{y}}^{t}

e_{\overline{x}}^{t^{'}} \leq n_{x}^{t} l + m_{\overline{x}}^{t} (l + 1) .

w_{k, l} = (a \overline{a}^{(s)})^{r} (b \overline{b}^{(s)})^{r} (c \overline{c}^{(s)})^{r} (d \overline{d}^{(s)})^{r} .

w_{k, l} = (a \overline{a}^{(s)})^{r} (b \overline{b}^{(s)})^{r} (c \overline{c}^{(s)})^{r} (d \overline{d}^{(s)})^{r} .

∣ yd (t) ∣_{x} = ∣ yd (t) ∣_{y} .

∣ yd (t) ∣_{x} = ∣ yd (t) ∣_{y} .

∣ yd (t^{'}) ∣_{x} = ∣ yd (t^{'}) ∣_{y} .

∣ yd (t^{'}) ∣_{x} = ∣ yd (t^{'}) ∣_{y} .

∣ yd (t^{'}) ∣_{x} + ∣ s ∣_{x} = ∣ yd (t^{'}) ∣_{y} + ∣ s ∣_{y} .

∣ yd (t^{'}) ∣_{x} + ∣ s ∣_{x} = ∣ yd (t^{'}) ∣_{y} + ∣ s ∣_{y} .

e_{\overline{x}}^{t_{0}} \leq n_{\overline{x}}^{t_{0}} + n_{x}^{t} l_{0} .

e_{\overline{x}}^{t_{0}} \leq n_{\overline{x}}^{t_{0}} + n_{x}^{t} l_{0} .

e_{\overline{x}}^{t_{0}^{'}} \leq n_{\overline{x}}^{t_{0}^{'}} + n_{x}^{t^{'}} l_{0} .

e_{\overline{x}}^{t_{0}^{'}} \leq n_{\overline{x}}^{t_{0}^{'}} + n_{x}^{t^{'}} l_{0} .

e_{\overline{x}}^{t_{0}^{'}} \leq n_{\overline{x}}^{t_{0}^{'}} + n_{x}^{t^{'}} l_{0} .

e_{\overline{x}}^{t_{0}^{'}} \leq n_{\overline{x}}^{t_{0}^{'}} + n_{x}^{t^{'}} l_{0} .

e_{\overline{x}}^{t_{0}} \leq e_{\overline{x}}^{t_{0}^{'}} + n_{\overline{x}}^{C_{2} [C_{4}]} \leq n_{\overline{x}}^{t_{0}} + n_{x}^{t} l_{0} .

e_{\overline{x}}^{t_{0}} \leq e_{\overline{x}}^{t_{0}^{'}} + n_{\overline{x}}^{C_{2} [C_{4}]} \leq n_{\overline{x}}^{t_{0}} + n_{x}^{t} l_{0} .

e_{\overline{x}}^{t_{0}^{'}} \leq n_{\overline{x}}^{t_{0}^{'}} + n_{x}^{t^{'}} l_{0} .

e_{\overline{x}}^{t_{0}^{'}} \leq n_{\overline{x}}^{t_{0}^{'}} + n_{x}^{t^{'}} l_{0} .

e_{\overline{y}}^{t_{0}} \geq n_{\overline{y}}^{t} - n_{x}^{t} l_{0} - r_{\overline{y}}^{t}

e_{\overline{y}}^{t_{0}} \geq n_{\overline{y}}^{t} - n_{x}^{t} l_{0} - r_{\overline{y}}^{t}

e_{\overline{y}}^{t_{0}^{'}} \geq n_{\overline{y}}^{t^{'}} - n_{x}^{t^{'}} l_{0} - r_{\overline{y}}^{t^{'}} .

e_{\overline{y}}^{t_{0}^{'}} \geq n_{\overline{y}}^{t^{'}} - n_{x}^{t^{'}} l_{0} - r_{\overline{y}}^{t^{'}} .

e_{\overline{y}}^{t_{0}} \geq n_{\overline{y}}^{t} - n_{x}^{t} l_{0} - m_{\overline{y}}^{t}

e_{\overline{y}}^{t_{0}} \geq n_{\overline{y}}^{t} - n_{x}^{t} l_{0} - m_{\overline{y}}^{t}

e_{\overline{x}}^{t_{0}} \leq n_{x}^{t} l + m_{\overline{x}}^{t} (l_{0} + 1) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\noautomath

Semantic expressive capacity with bounded memory

Antoine Venant

Alexander Koller

Department of Language Science and Technology

Saarland University

{venant|koller}@coli.uni-saarland.de

Abstract

We investigate the capacity of mechanisms for compositional semantic parsing to describe relations between sentences and semantic representations. We prove that in order to represent certain relations, mechanisms which are syntactically projective must be able to remember an unbounded number of locations in the semantic representations, where nonprojective mechanisms need not. This is the first result of this kind, and has consequences both for grammar-based and for neural systems.

1 Introduction

Semantic parsers which translate a sentence into a semantic representation compositionally must recursively compute a partial semantic representation for each node of a syntax tree. These partial semantic representations usually contain placeholders at which arguments and modifiers are attached in later composition steps. Approaches to semantic parsing differ in whether they assume that the number of placeholders is bounded or not. Lambda calculus Montague (1974); Blackburn and Bos (2005) assumes that the number of placeholders (lambda-bound variables) can grow unboundedly with the length and complexity of the sentence. By contrast, many methods which are based on unification Copestake et al. (2001) or graph merging Courcelle and Engelfriet (2012); Chiang et al. (2013) assume a fixed set of placeholders, i.e. the number of placeholders is bounded.

Methods based on bounded placeholders are popular both in the design of hand-written grammars Bender et al. (2002) and in semantic parsing for graphs Peng et al. (2015); Groschwitz et al. (2018). However, it is not clear that all relations between language and semantic representations can be expressed with a bounded number of placeholders. The situation is particularly challenging when one insists that the compositional analysis is projective in the sense that each composition step must combine adjacent substrings of the input sentence. In this case, it may be impossible to combine a semantic predicate with a distant argument immediately, forcing the composition mechanism to use up a placeholder to remember the argument position. If many predicates have distant arguments, this may exceed the bounded “memory capacity” of the compositional mechanism.

In this paper, we show that there are relations between sentences and semantic representations which can be described by compositional mechanisms which are bounded and non-projective, but not by ones which are bounded and projective. To our knowledge, this is the first result on expressive capacity with respect to semantics – in contrast to the extensive literature on the expressive capacity of mechanisms which describe just the string languages.

More precisely, we prove that tree-adjoining grammars can describe string-graph relations using the HR graph algebra Courcelle and Engelfriet (2012) with two sources (bounded, non-projective) which cannot be described using linear monadic context-free tree grammars and the HR algebra with $k$ sources, for any fixed $k$ (bounded, projective). This result is especially surprising because TAG and linear monadic CFTGs describe the same string languages; thus the difference lies only in the projectivity of the syntactic analysis.

We further prove that given certain assumptions on the alignment between tokens in the sentence and edges in the graph, no generative device for projective syntax trees can simulate TAG with two sources. This has practical consequences for the design of transition-based semantic parsers (whether grammar-based or neural).

Plan of the paper. We will first explain the linguistic background in Section 2 and lay the formal foundations in Section 3. We will then prove the reduced semantic expressive capacity for aligned generative devices in Section 4 and for CFTGs in Section 5. We conclude with a discussion of the practical impact of our findings (Section 6).

2 Compositional semantic construction

The Principle of Compositionality, which is widely accepted in theoretical semantics, states that the meaning of a natural-language expression can be determined from the meanings of its immediate subexpressions and the way in which the subexpressions were combined. Implementations of this principle usually assume that there is some sort of syntax tree which describes the grammatical structure of a sentence. A semantic representation is then calculated by bottom-up evaluation of this syntax tree, starting with semantic representations of the individual words and then recursively computing a semantic representation for each node from those of its children.

2.1 Compositional mechanisms

Mechanisms for semantic composition will usually keep track of places at which semantic arguments are still missing or modifiers can still be attached. For instance, when combining the semantic representations for “John” and “sleeps” in a derivation of “John sleeps”, the “subject” argument of “sleeps” is filled with the meaning of “John”. The compositional mechanism therefore assigns a semantic representation to “sleeps” which has an unfilled placeholder for the subject.

The exact nature of the placeholder depends on the compositional mechanism. There are two major classes in the literature. Lambda-style compositional mechanisms use a list of placeholders. For instance, lambda calculus, as used e.g. in Montague Grammar Montague (1974), CCG Steedman (2001), or linear-logic-based approaches in LFG Dalrymple et al. (1995) might represent “sleeps” as $\lambda x.\mathsf{sleep}(x)$ . Placeholders are lambda-bound variables (here: $x$ ).

By contrast, unification-style compositional mechanisms use names for placeholders. For example, a simplified form of the Semantic Algebra used in HPSG Copestake et al. (2001) might represent “sleeps” as the feature structure $[\mathsf{subj}{:}\framebox{1},\mathsf{sem}{:}[\mathsf{pred}{:}\mathsf{sleep},\mathsf{agent}{:}\framebox{1}]]$ . This is unified with $[\mathsf{subj}{:}\mathsf{John}]$ . The placeholders are holes with labels from a fixed set of argument names (e.g. $\mathsf{subj}$ ). Named placeholders are also used in the HR algebra Courcelle and Engelfriet (2012) and its derivatives, like Hyperedge Replacement Grammars Drewes et al. (1997); Chiang et al. (2013) and the AM algebra Groschwitz et al. (2018).

2.2 Boundedness and projectivity

A fundamental difference between lambda-style and unification-style compositional mechanisms is in their “memory capacity”: the number of placeholders in a lambda-style mechanism can grow unboundedly with the length and complexity of the sentence (e.g. by functional composition of lambda terms), whereas in a unification-style mechanism, the placeholders are fixed in advance.

There is an informal intuition that unbounded memory is needed especially when an unbounded number of semantic predicates can be far away from their arguments in the sentence, and the syntax formalism does not allow these predicates to combine immediately with the arguments. For illustration, consider the two derivations of the following Swiss German sentence from Shieber (1985) in Fig. 1:

{exe}\ex\gll

(dass) (mer) d’ chind em Hans es huus lönd hälfed aastriiche

(that) (we) the-children-ACC Hans-DAT the-house-ACC let help paint

\glt‘(that we) let the children help Hans paint the house’

The lexical semantic representation of each verb comes with a placeholder for its object ( $o_{1},o_{2},o_{3}$ ) and, in the case of “lönd” and “hälfed”, also one for its verb complement ( $v$ ). The derivation in Fig. 1a immediately combines each verb with its complements; the placeholders that are used at each node never grow beyond the ones the verbs originally had. However, this derivation combines verbs with nouns which are not adjacent in the string, which is not allowed in many grammar formalisms. If we limit ourselves to combining only adjacent substrings (projectively, see Fig. 1b), we must remember the placeholders for all the verbs at the same time if we want to obtain the correct predicate-argument structure. Thus, the number of placeholders grows with the length of the sentence; this is only possible with a lambda-style compositional mechanism.

There is scattered evidence in the literature for this tension between bounded memory and projectivity. Chiang et al. (2013) report (of a compositional mechanism based on the HR algebra, unification-style) that a bounded number of placeholders suffices to derive the graphs in the AMR version of the Geoquery corpus, but Groschwitz et al. (2018) find that this requires non-projective derivations in 37% of the AMRBank training data Banarescu et al. (2013). Approaches to semantic construction with tree-adjoining grammar either perform semantic composition along the TAG derivation tree using unification (non-projective, unification-style) Gardent and Kallmeyer (2003) or along the TAG derived tree using linear logic (projective, lambda-style) Frank and van Genabith (2001). Bender (2008) discusses the challenges involved in modeling the predicate-argument structure of a language with very free word order (Wambaya) with projective syntax. While the Wambaya noun phrase does not seem to require the projective grammar to collect unbounded numbers of unfilled arguments as in Fig. 1b, Bender notes that her projective analysis still requires a more flexible handling of semantic arguments than the HPSG Semantic Algebra (unification-style) supports.

In this paper, we define a notion of semantic expressive capacity and prove the first formal results about the relationship between projectivity and bounded memory.

3 Formal background

Let $\mathbb{N}_{0}=\{0,1,\ldots\}$ be the nonnegative integers. A signature is a finite set $\Sigma$ of function symbols $f$ , each of which has been assigned a nonnegative integer called its rank. We write $\Sigma_{n}$ for the symbols of rank $n$ . Given a signature $\Sigma$ , we say that all constants $a\in\Sigma_{0}$ are trees over $\Sigma$ ; further, if $f\in\Sigma_{n}$ and $t_{1},\ldots,t_{n}$ are trees over $\Sigma$ , then $f(t_{1},\ldots,t_{n})$ is also a tree. We write $\mathcal{T}_{\Sigma}$ for the set of all trees over $\Sigma$ . We define the height $\operatorname{\mathsf{ht}}(t)$ of a tree $t=f(t_{1},\ldots,t_{n})$ to be $1+\max\operatorname{\mathsf{ht}}(t_{i})$ , and $\operatorname{\mathsf{ht}}(c)=1$ for $c\in\Sigma_{0}$ .

Let $X\notin\Sigma$ , and let $\Sigma_{X}=\Sigma\cup\{X\}$ (with $X$ as a constant of rank 0). Then we call a tree $C\in\mathcal{T}_{\Sigma_{X}}$ a context if it contains exactly one occurrence of $X$ , and write $\mathcal{C}_{\Sigma}$ for the set of all contexts. A context can be seen as a tree with exactly one hole. If $t\in\mathcal{T}_{\Sigma}$ , we write $C[t]$ for the tree in $\mathcal{T}_{\Sigma}$ that is obtained by replacing $X$ with $t$ .

Given a string $w\in W^{*}$ , we write $|w|_{a}$ for the number of times that $a\in W$ occurs in $w$ .

3.1 Grammars for strings and trees

We take a very general view on how semantic representations for strings are constructed compositionally. To this end, we define a notion of “grammar” which encompasses more devices for describing languages than just traditional grammars, such as transition-based parsers.

We say that a tree grammar $G$ over the signature $\Sigma$ is any finite device that defines a language $L(G)\subseteq\mathcal{T}_{\Sigma}$ . For instance, regular tree grammars Comon et al. (2007) are tree grammars, and context-free grammars can also be seen as tree grammars defining the language of parse trees.

We say that a string grammar ${\mathcal{G}}=(G,\operatorname{\mathsf{yd}})$ over the signature $\Sigma$ and the alphabet $W$ is a pair consisting of a tree grammar $G$ over $\Sigma$ and a yield function $\operatorname{\mathsf{yd}}:\mathcal{T}_{\Sigma}\rightarrow W^{*}$ which maps trees to strings over $W$ Weir (1988). A string grammar defines a language $L({\mathcal{G}})=\{\operatorname{\mathsf{yd}}(t)\mid t\in L(G)\}\subseteq W^{*}$ . We call the trees $t\in L(G)$ derivations.

A particularly common yield function is the function $\operatorname{\operatorname{\mathsf{yd}}_{pr}}$ , defined as $\operatorname{\operatorname{\mathsf{yd}}_{pr}}(f(t_{1},\ldots,t_{n}))=\operatorname{\operatorname{\mathsf{yd}}_{pr}}(t_{1})\cdot\ldots\cdot\operatorname{\operatorname{\mathsf{yd}}_{pr}}(t_{n})$ if $n>0$ and $\operatorname{\operatorname{\mathsf{yd}}_{pr}}(c)=c$ if $c$ has rank 0. This yield function simply concatenates the words at the leaves of $t$ . Applied to the phrase-structure tree $t$ in Fig. 2c, $\operatorname{\operatorname{\mathsf{yd}}_{pr}}(t)$ is the Swiss German sentence in (2.2). Context-free grammars can be characterized as string grammars that combine a regular tree grammar with $\operatorname{\operatorname{\mathsf{yd}}_{pr}}$ . By contrast, we can model tree-adjoining grammars (TAG, Joshi and Schabes, 1997) by choosing a tree grammar $G$ that describes derivation trees as in Fig. 2b. The $\operatorname{\mathsf{yd}}$ function could then substitute and adjoin the elementary trees as specified by the derivation tree (see Fig. 2a) and then read off the words from the resulting derived tree in Fig. 2c.

We say that a string grammar is projective if its yield function is $\operatorname{\operatorname{\mathsf{yd}}_{pr}}$ . Context-free grammars as construed above are clearly projective. Tree-adjoining grammars are not projective: For instance, the yield of the subtree below “aastriiche” in Fig. 2b consists of the two separate strings “es Huus” and “aastriiche”, which are then wrapped around “lönd hälfed” further up in the derivation.

If the grammar is projective, then for any context $C$ there exist two strings $\operatorname{\mathsf{left}}(C)$ and $\operatorname{\mathsf{right}}(C)$ such that for any tree $t$ , $\operatorname{\mathsf{yd}}(C[t])=\operatorname{\mathsf{left}}(C)\cdot\operatorname{\mathsf{yd}}(t)\cdot\operatorname{\mathsf{right}}(C)$ .

3.2 Context-free tree languages

Below, we will talk about linear monadic context-free tree grammars (LM-CFTGs; Rounds (1969), Comon et al. (2007)). An LM-CFTG is a quadruple $G=(N,\Sigma,R,S)$ , where $N$ is a ranked signature of nonterminals of rank at most one, $\Sigma$ is a ranked signature of terminals, $S\in N_{0}$ is the start symbol, and $R$ is a finite set of production rules of one of the forms

•

$A\rightarrow t$ with $A\in N_{0}$ and $t\in\mathcal{T}_{V}$

•

$A(t)\rightarrow C[t]$ with $A\in N_{1}$ and $C\in\mathcal{C}_{V}$ ,

where $V=N\cup\Sigma$ . The trees in $L(G)\subseteq\mathcal{T}_{\Sigma}$ are obtained by expanding $S$ with production rules. Nonterminals of rank zero are expanded by replacing them with trees. Nonterminals of rank one must have exactly one child in the tree; they are replaced by a context, and the variable in the context is replaced by the subtree below the child.

We can extend an LM-CFTG $G$ to a string grammar ${\mathcal{G}}=(G,\operatorname{\operatorname{\mathsf{yd}}_{pr}})$ . Then LM-CFTG is weakly equivalent to TAG Kepser and Rogers (2011); that is, LM-CFTG and TAG generate the same class of string languages. Intuitively, the weakly equivalent LM-CFTG directly describes the language of derived trees of the TAG grammar (cf. Fig. 2c). Notice that LM-CFTG is projective.

Below, we will make crucial use of the following pumping lemma for LM-CFTLs:

*Lemma 1** (Maibaum (1978)).*

Let $G$ be an LM-CFTG. There exists a constant $p\in\mathbb{N}_{0}$ such that for any $t\in L(G)$ with $\operatorname{\mathsf{ht}}(t)>p$ , there exists a decomposition $t=C_{1}[C_{2}[C_{3}[C_{4}[t_{5}]]]]$ with $\operatorname{\mathsf{ht}}(C_{2}[C_{3}[C_{4}[X]]])\leq p$ and $\operatorname{\mathsf{ht}}(C_{2})+\operatorname{\mathsf{ht}}(C_{4})>0$ such that for any $i\in\mathbb{N}_{0}$ , $C_{1}[v^{i}[t_{5}]]\in L(G)$ , where we let $v^{0}=C_{3}$ and $v^{i+1}=C_{2}[v^{i}[C_{4}[X]]].$

We call $p$ the pumping height of $L(G)$ .

3.3 The HR algebra

The specific unification-style semantic algebra we use in this paper is the HR algebra Courcelle and Engelfriet (2012). This choice encompasses much of the recent literature on compositional semantic parsing with graphs, based e.g. on Hyperedge Replacement Grammars Chiang et al. (2013); Peng et al. (2015); Koller (2015) and the AM algebra Groschwitz et al. (2018).

The values of the HR algebra are s-graphs: directed, edge-labeled graphs, some of whose nodes may be designated as sources, written in angle brackets. S-graphs can be combined using the forget, rename, and merge operations. Rename $\mathsf{ren}_{\mathsf{a}\rightarrow\mathsf{b}}$ changes an $\mathsf{a}$ -source node into a $\mathsf{b}$ -source node. Forget $\mathsf{f}_{\mathsf{a}}$ makes it so the $\mathsf{a}$ -source node in the s-graph is no longer a source node. Merge $\;||\;$ combines two s-graphs while unifying nodes with the same source annotation. For instance, the s-graphs $\langle\mbox{$ \mathsf{rt} $}\rangle\xrightarrow{\mbox{ARG1}}\langle\mbox{$ \mathsf{o} $}\rangle$ and $\langle\mbox{$ \mathsf{o} $}\rangle\leavevmode\hbox to22pt{\vbox to14.18pt{\pgfpicture\makeatletter\hbox{\hskip 1.97176pt\lower-4.30202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor[named]{pgfstrokecolor}{rgb}{1,1,1}\pgfsys@color@gray@stroke{1}\pgfsys@invoke{ }\definecolor[named]{pgffillcolor}{rgb}{1,1,1}\pgfsys@color@gray@fill{1}\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor[named]{pgfstrokecolor}{rgb}{1,1,1}\pgfsys@color@gray@stroke{1}\pgfsys@invoke{ }\definecolor[named]{pgffillcolor}{rgb}{1,1,1}\pgfsys@color@gray@fill{1}\pgfsys@invoke{ }{}\pgfsys@moveto{1.42264pt}{0.0pt}\pgfsys@curveto{1.42264pt}{0.7857pt}{0.7857pt}{1.42264pt}{0.0pt}{1.42264pt}\pgfsys@curveto{-0.7857pt}{1.42264pt}{-1.42264pt}{0.7857pt}{-1.42264pt}{0.0pt}\pgfsys@curveto{-1.42264pt}{-0.7857pt}{-0.7857pt}{-1.42264pt}{0.0pt}{-1.42264pt}\pgfsys@curveto{0.7857pt}{-1.42264pt}{1.42264pt}{-0.7857pt}{1.42264pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@fillstroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{} {{}{}{{}}{}}{{}{}{{}}{}}{{}{}}{{}} {{}{}{{}}{}}{{{}}{{}}}{{}}{{}{}{{}}{}}{{{}}{{}}}{{}}{}{{}}{{{{{}}{}{}{}{}{{}}}}}{{}}{}{{{{{{}}{}{}{}{}{{}}}}}{}{}{}{}}{}{}{}{}{{}}{{}}{{}}{}{{ {\pgfsys@beginscope\pgfsys@setlinewidth{0.32pt}\pgfsys@setdash{}{0.0pt}\pgfsys@roundcap\pgfsys@roundjoin{} {}{}{} {}{}{} \pgfsys@moveto{-1.19998pt}{1.59998pt}\pgfsys@curveto{-1.09998pt}{0.99998pt}{0.0pt}{0.09999pt}{0.29999pt}{0.0pt}\pgfsys@curveto{0.0pt}{-0.09999pt}{-1.09998pt}{-0.99998pt}{-1.19998pt}{-1.59998pt}\pgfsys@stroke\pgfsys@endscope}} }{}{}{{}}\pgfsys@moveto{1.56734pt}{0.41997pt}\pgfsys@curveto{15.30898pt}{4.10202pt}{15.30898pt}{-4.10202pt}{2.97757pt}{-0.79784pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{-0.96593}{0.25882}{-0.25882}{-0.96593}{2.97757pt}{-0.79784pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-1.97176pt}{3.04527pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{Hans}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{-2.15277pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{ {}{}{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}$ are merged into $\langle\mbox{$ \mathsf{rt} $}\rangle\xrightarrow{\mbox{ARG0}}\langle\mbox{$ \mathsf{o} $}\rangle\leavevmode\hbox to22pt{\vbox to14.18pt{\pgfpicture\makeatletter\hbox{\hskip 1.97176pt\lower-4.30202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor[named]{pgfstrokecolor}{rgb}{1,1,1}\pgfsys@color@gray@stroke{1}\pgfsys@invoke{ }\definecolor[named]{pgffillcolor}{rgb}{1,1,1}\pgfsys@color@gray@fill{1}\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor[named]{pgfstrokecolor}{rgb}{1,1,1}\pgfsys@color@gray@stroke{1}\pgfsys@invoke{ }\definecolor[named]{pgffillcolor}{rgb}{1,1,1}\pgfsys@color@gray@fill{1}\pgfsys@invoke{ }{}\pgfsys@moveto{1.42264pt}{0.0pt}\pgfsys@curveto{1.42264pt}{0.7857pt}{0.7857pt}{1.42264pt}{0.0pt}{1.42264pt}\pgfsys@curveto{-0.7857pt}{1.42264pt}{-1.42264pt}{0.7857pt}{-1.42264pt}{0.0pt}\pgfsys@curveto{-1.42264pt}{-0.7857pt}{-0.7857pt}{-1.42264pt}{0.0pt}{-1.42264pt}\pgfsys@curveto{0.7857pt}{-1.42264pt}{1.42264pt}{-0.7857pt}{1.42264pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@fillstroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{} {{}{}{{}}{}}{{}{}{{}}{}}{{}{}}{{}} {{}{}{{}}{}}{{{}}{{}}}{{}}{{}{}{{}}{}}{{{}}{{}}}{{}}{}{{}}{{{{{}}{}{}{}{}{{}}}}}{{}}{}{{{{{{}}{}{}{}{}{{}}}}}{}{}{}{}}{}{}{}{}{{}}{{}}{{}}{}{}{}{}{{}}\pgfsys@moveto{1.56734pt}{0.41997pt}\pgfsys@curveto{15.30898pt}{4.10202pt}{15.30898pt}{-4.10202pt}{2.97757pt}{-0.79784pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{-0.96593}{0.25882}{-0.25882}{-0.96593}{2.97757pt}{-0.79784pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-1.97176pt}{3.04527pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{Hans}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{-2.15277pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{ {}{}{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}$ .

The HR algebra uses operation symbols from a ranked signature $\Delta$ to describe s-graphs syntactically. $\Delta$ contains symbols for merge (rank 2) and the forget and rename operations (rank 1). It also contains constants (symbols of rank 0) which denote s-graphs of the form $\langle\mbox{$ \mathsf{a} $}\rangle\xrightarrow{\mbox{f}}\langle\mbox{$ \mathsf{b} $}\rangle$ and $\langle\mbox{$ \mathsf{a} $}\rangle\leavevmode\hbox to17.13pt{\vbox to14.29pt{\pgfpicture\makeatletter\hbox{\hskip 1.62263pt\lower-4.30202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor[named]{pgfstrokecolor}{rgb}{1,1,1}\pgfsys@color@gray@stroke{1}\pgfsys@invoke{ }\definecolor[named]{pgffillcolor}{rgb}{1,1,1}\pgfsys@color@gray@fill{1}\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor[named]{pgfstrokecolor}{rgb}{1,1,1}\pgfsys@color@gray@stroke{1}\pgfsys@invoke{ }\definecolor[named]{pgffillcolor}{rgb}{1,1,1}\pgfsys@color@gray@fill{1}\pgfsys@invoke{ }{}\pgfsys@moveto{1.42264pt}{0.0pt}\pgfsys@curveto{1.42264pt}{0.7857pt}{0.7857pt}{1.42264pt}{0.0pt}{1.42264pt}\pgfsys@curveto{-0.7857pt}{1.42264pt}{-1.42264pt}{0.7857pt}{-1.42264pt}{0.0pt}\pgfsys@curveto{-1.42264pt}{-0.7857pt}{-0.7857pt}{-1.42264pt}{0.0pt}{-1.42264pt}\pgfsys@curveto{0.7857pt}{-1.42264pt}{1.42264pt}{-0.7857pt}{1.42264pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@fillstroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{} {{}{}{{}}{}}{{}{}{{}}{}}{{}{}}{{}} {{}{}{{}}{}}{{{}}{{}}}{{}}{{}{}{{}}{}}{{{}}{{}}}{{}}{}{{}}{{{{{}}{}{}{}{}{{}}}}}{{}}{}{{{{{{}}{}{}{}{}{{}}}}}{}{}{}{}}{}{}{}{}{{}}{{}}{{}}{}{}{}{}{{}}\pgfsys@moveto{1.56734pt}{0.41997pt}\pgfsys@curveto{15.30898pt}{4.10202pt}{15.30898pt}{-4.10202pt}{2.97757pt}{-0.79784pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{-0.96593}{0.25882}{-0.25882}{-0.96593}{2.97757pt}{-0.79784pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{7.50047pt}{3.04527pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{f}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{-2.15277pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{ {}{}{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}$ , where $\mathsf{a},\mathsf{b}$ are sources and $f$ is an edge label. Terms $t\in\mathcal{T}_{\Delta}$ over this signature evaluate recursively to s-graphs $\llbracket t\rrbracket$ , as usual in an algebra. Each instance of the HR algebra uses a fixed, finite set of $k$ source names which can be used in the constant s-graphs and the rename and forget operations. The class of graphs which can be expressed as values of terms over the algebra increases with $k$ . We write $\mathcal{H}_{k}$ for the HR algebra with $k$ source names (and some set of edge labels).

Let $G$ be an s-graph, and let $G^{\prime}$ be a subgraph of $G$ , i.e. a subset of its edges. We call a node a boundary node of $G^{\prime}$ if it is incident both to an edge in $G^{\prime}$ and to an edge that is not in $G^{\prime}$ . For instance, the s-graph in Fig. 2e is a subgraph of the one in Fig. 2d; the boundary nodes are drawn shaded in (d). The following lemma holds:

*Lemma 2**.*

Let $G=\llbracket C[t]\rrbracket$ be an s-graph, and let $G^{\prime}$ be a subgraph of $G$ such that the s-graph $\llbracket t\rrbracket$ contains the same edges as $G^{\prime}$ . Then every boundary node in $G^{\prime}$ is a source in $\llbracket t\rrbracket$ .

3.4 Grammars with semantic interpretations

Finally, we extend string grammars to compositionally relate strings with semantic representations. Let ${\mathcal{G}}=(G,\operatorname{\mathsf{yd}})$ be a string grammar. The tree grammar $G$ generates a language $L(G)\subseteq\mathcal{T}_{\Sigma}$ of trees. We will map each tree $t\in L(G)$ into a term $h(t)$ over some algebra $\mathcal{A}$ over a signature $\Delta$ using a linear tree homomorphism (LTH) $h:\mathcal{T}_{\Sigma}\rightarrow\mathcal{T}_{\Delta}$ Comon et al. (2007), i.e. by compositional bottom-up evaluation. This defines a relation between strings and values of $\mathcal{A}$ :

[TABLE]

For instance, $\mathcal{A}$ could be some HR algebra $\mathcal{H}_{k}$ ; then $\mathcal{R}\mathcal{E}\mathcal{L}({\mathcal{G}},h,\mathcal{H}_{k})$ will be a binary relation between strings and s-graphs. In this case, we abbreviate $\llbracket h(t)\rrbracket$ as $\operatorname{\mathsf{graph}}(t)$ .

If we look at an entire class ${\mathbb{G}}$ of string grammars and a fixed algebra, this defines a class of such relations:

[TABLE]

In the example in Fig. 2, we can define a linear homomorphism $h$ to map the derivation tree $t$ in (b) to a term $h(t)$ which evaluates to the s-graph shown in (d). At the top of this term, the s-graphs at the “chind” and “hälfed” (f,g) nodes are combined into (d) by $h(\mbox{lönd})$ :

[TABLE]

This non-projective derivation produces the s-graph in (d) using only two sources, $\mathsf{rt}$ and $\mathsf{o}$ . By contrast, a homomorphic interpretation of the projective tree (c) has to use at least four sources, as the intermediate result in (e) illustrates.

4 Projective cross-serial dependencies

We will now investigate the ability of projective grammar formalisms $({\mathbb{G}},\mathcal{H}_{k})$ to express $\mathcal{L}(\mbox{TAG},\mathcal{H}_{2})$ . We will define a relation $\mathsf{CSD}\in\mathcal{L}(\mbox{TAG},\mathcal{H}_{2})$ and prove that $\mathsf{CSD}$ cannot be generated by projective grammar formalisms with bounded $k$ . We show this first for arbitrary projective ${\mathbb{G}}$ , under certain assumptions on the alignment of words and graph edges. In Section 5, we drop these assumptions, but focus on ${\mathbb{G}}=\mbox{LM-CFTG}$ .

4.1 The relation $\mathsf{CSD}$

To construct $\mathsf{CSD}$ , consider the string language $\mathsf{CSD}_{s}=\{A^{n}B^{m}C^{n}D^{m}\mid m,n\geq 1\}$ , where

[TABLE]

and analogously for $B,C,D$ . An example string in $\mathsf{CSD}_{s}$ is $a\langle\langle\overline{a}\overline{a}\rangle\rangle\;b\langle\overline{b}\rangle\;b\;c\langle\overline{c}\rangle\;d\;d.$ Note that $k$ can be chosen independently for each segment.

Every string $w\in\mathsf{CSD}_{s}$ can be uniquely described by $m$ , $n$ , and a sequence $K(w)=(K^{(a)},K^{(b)},K^{(c)},K^{(d)})$ of numbers specifying the $k$ ’s used in each segment, where $K^{(a)},K^{(c)}$ each contain $n$ numbers and $K^{(b)},K^{(d)}$ contain $m$ numbers. In the example, we have $n=1$ , $m=2$ , and $K(w)=((2),(1,0),(1),(0,0))$ .

We associate a graph $G_{w}$ with each string $w\in\mathsf{CSD}_{s}$ by the construction illustrated in Fig. 3. For each $1\leq i\leq n$ , we define the $i$ -th $a$ -block to be the graph consisting of nodes $u\xrightarrow{\mbox{c}}v$ with a further outgoing $a$ -edge from $u$ . In addition, $u$ connects to a linear chain of $K^{(a)}_{i}$ edges with label $\overline{a}$ , and $v$ to a linear chain of $K^{(c)}_{i}$ $\overline{c}$ -edges. $G_{w}$ consists of a linear chain of the $n$ $a$ -blocks, followed by the $m$ $b$ -blocks (defined analogously). We let $\mathsf{CSD}=\{(w,G_{w})\mid w\in\mathsf{CSD}_{s}\}$ .

Note that $\mathsf{CSD}$ is a more intricate version of the cross-serial dependency language. $\mathsf{CSD}$ can be generated by a TAG grammar along the lines of the one from Section 3.4, using a HR algebra with two sources; thus $\mathsf{CSD}\in{\mathcal{L}}(\mbox{TAG},\mathcal{H}_{2})$ .

4.2 $\mathsf{CSD}$ with bounded blocks

The characteristic feature of $\mathsf{CSD}$ is that edges which are close together in the graph (e.g. the $a$ and $c$ edge in an $a$ -block) correspond to symbols that can be distant in the string (e.g. $a$ and $c$ tokens). Projective grammars cannot combine predicates ( $a$ ) and arguments ( $c$ ) directly because of their distance in the string; intuitively, they must keep track of either the $c$ ’s or the $a$ ’s for a long time, which cannot be done with a bounded $k$ .

Before we go into exploiting this intuition, we first note that its correctness depends on the details of the construction of $\mathsf{CSD}$ , in particular the ability to select arbitrary and independent $K^{(x)}$ for the different $x\in\{a,b,c,d\}$ . Consider the derivation $t$ on the left of Fig. 4 with its projective yield $abbcdd$ ; this is the case $((0),(0,0),(0),(0,0))$ of $\mathsf{CSD}$ , corresponding to the $\mathsf{CSD}$ graph $G_{1}$ shown in Fig. 4 (a). We can map $t$ to this graph by applying the following linear tree homomorphism $h$ into $\mathcal{H}_{2}$ :

[TABLE]

A derivation of the form $*_{0}(t_{1},t_{2})$ evaluates to the same graph as $t_{1}$ ; the graph value of $t_{2}$ is ignored. Thus if we assume that the subtree of $t$ for $cdd$ evaluates to some arbitrary graph $G_{0}$ , the complete derivation $t$ evaluates to $G_{1}$ . Some intermediate results are shown on the right of Fig. 4.

If we let $\mathsf{CSD}_{0}$ be the subset of $\mathsf{CSD}$ where all $K^{(x)}$ are zero, we can generalize this construction into an LM-CFTG which generates $\mathsf{CSD}_{0}$ . Thus, $\mathsf{CSD}_{0}$ can be generated by a projective grammar that is interpreted into $\mathcal{H}_{2}$ . But note that the derivation in Fig. 4 is unnatural in that the symbols in the string are not generated by the same derivation steps that generate the graph nodes that intuitively correspond to them; for instance, the graphs generated for the $d$ tokens are completely irrelevant. Below, we prevent unnatural constructions like this in two ways. We will first assume that string symbols and graph nodes must be aligned (Thm. 1). Then we will assume that the $K^{(x)}$ can be arbitary, which allows us to drop the alignment assumption (Thm. 2).

4.3 $k$ -distant trees

Let ${\mathbf{R}}\supseteq\mathsf{CSD}_{0}$ be some relation containing at least the string-graph pairs of $\mathsf{CSD}_{0}$ , e.g. $\mathsf{CSD}$ itself. Assume that ${\mathbf{R}}$ is generated by a projective grammar $({\mathcal{G}},h)$ with ${\mathcal{G}}=(G,\operatorname{\operatorname{\mathsf{yd}}_{pr}})$ and a fixed number $k$ of sources, i.e. we have ${\mathbf{R}}=\mathcal{R}\mathcal{E}\mathcal{L}({\mathcal{G}},h,\mathcal{H}_{k})$ . We will prove a contradiction.

Given a pair $(w,G_{w})\in{\mathbf{R}}$ , we say that two edges $e,f$ in $G_{w}$ are equivalent, $e\equiv f$ , if they belong to the same block. We call a derivation tree $t\in{\mathbf{T}}=L(G)$ $k$ -distant if $t$ has a subtree $t^{\prime}$ such that we can find $k$ edges $e_{1},\ldots,e_{k}\in\operatorname{\mathsf{graph}}(t^{\prime})$ with $e_{i}\not\equiv e_{j}$ for all $i\neq j$ and $k$ further edges $e^{\prime}_{1},\ldots,e^{\prime}_{k}\in G_{w}\backslash\operatorname{\mathsf{graph}}(t^{\prime})$ such that $e_{i}\equiv e^{\prime}_{i}$ for all $i$ . For such trees, we have the following lemma.

*Lemma 3**.*

A $k$ -distant tree has a subtree $t^{\prime}$ such that $\operatorname{\mathsf{graph}}(t^{\prime})$ has at least $k$ sources.

*Proof 1**.*

Let $\mbox{BK}_{i}$ be the $i$ -th block in $G_{w}$ ; we let $1\leq i\leq m+n$ and do not distinguish between $a$ - and $b$ -blocks. Let $t^{\prime}$ be the subtree of $t$ claimed by the definition of distant trees. For each $i$ , let $E^{\prime}_{i}=\mbox{BK}_{i}\cap\operatorname{\mathsf{graph}}(t^{\prime})$ be the edges in the $i$ -th block generated by $t^{\prime}$ , and let $E_{i}=\mbox{BK}_{i}\backslash E^{\prime}_{i}$ .

By definition, $E_{i}$ and $E^{\prime}_{i}$ are both non-empty for at least $k$ blocks. Each of these blocks is weakly connected, and thus contains at least one node $u_{i}$ which is incident both to an edge in $E_{i}$ and in $E^{\prime}_{i}$ . This node is a boundary node of $\operatorname{\mathsf{graph}}(t^{\prime})$ . Because $u_{1},\ldots,u_{k}$ are all distinct, it follows from Lemma 2 that $\operatorname{\mathsf{graph}}(t^{\prime})$ has at least $k$ sources.

We also note the following lemma about derivations of projective string grammars, which follows from the inability of projective grammars to combine distant tokens. We write $\mathsf{Sep}=\{a/c,c/a,b/d,d/b\}$ .

*Lemma 4**.*

Let ${\mathcal{G}}=(G,\operatorname{\mathsf{yd}})$ be a projective string grammar. For any $r\in\mathbb{N}_{0}$ there exists $s\in\mathbb{N}_{0}$ such that any $t\in L(G)$ with $\operatorname{\mathsf{yd}}(t)\in a^{*}b^{s}c^{s}d^{*}$ has a subtree $t^{\prime}$ such that $\operatorname{\mathsf{yd}}(t^{\prime})$ contains $r$ occurrences of $x$ and no occurrences of $y$ , for some $x/y\in\mathsf{Sep}$ .

4.4 Projectivity and alignments

A consequence of Lemma 3 is that if certain string-graph pairs in $\mathsf{CSD}_{0}$ can only be expressed with $k{+}1$ -distant trees, then ${\mathbf{R}}$ (which contains these pairs as well) is not in $\mathcal{L}({\mathcal{G}},\mathcal{H}_{k})$ , because $\mathcal{H}_{k}$ only admits $k$ sources.

However, as we saw in Section 4.2, pairs in $\mathsf{CSD}_{0}$ can have unexpected projective derivations which make do with a low number of sources. So let’s assume for now that the string grammar ${\mathcal{G}}$ and the tree homomorphism $h$ produce tokens and edge labels that fit together. Let us call ${\mathcal{G}},h$ aligned if for all constants $c\in\Sigma_{0}$ , $\operatorname{\mathsf{graph}}(c)$ is a graph containing a single edge with label $\operatorname{\mathsf{yd}}(c)$ . The derivation in Fig. 4 cannot be generated by an aligned grammar because the graph for the token $b$ contains a $d$ -edge. We write ${\mathcal{L}}_{\leftrightarrow}({\mathbb{G}},\mathcal{A})=\{\mathcal{R}\mathcal{E}\mathcal{L}({\mathcal{G}},h,\mathcal{A})\mid\mbox{$ {\mathcal{G}}\in{\mathbb{G}} $and$ {\mathcal{G}},h $aligned}\}$ for the class of string-semantics relations which can be generated with aligned grammars.

Under this assumption, it is easy to see that any relation including $\mathsf{CSD}_{0}$ (hence, $\mathsf{CSD}$ ) cannot be expressed with a projective grammar.

*Theorem 1**.*

Let ${\mathbb{G}}$ be any class of projective string grammars and ${\mathbf{R}}\supseteq\mathsf{CSD}_{0}$ . For any $k$ , ${\mathbf{R}}\not\in{\mathcal{L}}_{\leftrightarrow}({\mathbb{G}},\mathcal{H}_{k})$ .

*Proof 2**.*

Assume that there is a ${\mathcal{G}}=(G,\operatorname{\operatorname{\mathsf{yd}}_{pr}})\in{\mathbb{G}}$ and an LTH $h$ such that ${\mathbf{R}}=\mathcal{R}\mathcal{E}\mathcal{L}({\mathcal{G}},h,\mathcal{H}_{k})$ . Given $k$ , choose $s\in\mathbb{N}_{0}$ such that every tree $t\in{\mathbf{T}}=L(G)$ with $\operatorname{\mathsf{yd}}(t)=a^{s}b^{s}c^{s}d^{s}$ has a subtree $t^{\prime}$ such that $\operatorname{\mathsf{yd}}(t^{\prime})$ contains $k+1$ occurrences of $x$ and no occurrences of $y$ , for some $x/y\in\mathsf{Sep}$ . Such an $s$ exists according to Lemma 4. We can choose $t$ such that $(\operatorname{\mathsf{yd}}(t),\operatorname{\mathsf{graph}}(t))\in\mathsf{CSD}_{0}$ .

Because ${\mathcal{G}},h$ are aligned, $\operatorname{\mathsf{graph}}(t^{\prime})$ contains no $y$ -edge and at least $k+1$ $x$ -edges. Each of these $x$ -edges is non-equivalent to all the others, and equivalent to a $y$ -edge in $\operatorname{\mathsf{graph}}(t)\backslash\operatorname{\mathsf{graph}}(t^{\prime})$ , so $t$ is $k{+}1$ -distant. It follows from Lemma 3 that $\operatorname{\mathsf{graph}}(t^{\prime})$ has $k+1$ sources, in contradiction to the assumption that ${\mathcal{G}},h$ uses only $k$ sources.

5 Expressive capacity of LM-CFTG

Thm. 1 is a powerful result which shows that $\mathsf{CSD}$ cannot be generated by any device for generating projective derivations using bounded placeholder memory – if we can assume that tokens and edges are aligned. We will now drop this assumption and prove that $\mathsf{CSD}$ cannot be generated using a fixed set of placeholders using LM-CFTG, regardless of alignment. The basic proof idea is to enforce a weak form of alignment through the interaction of the pumping lemma with very long $\overline{x}$ -chains. The result is remarkable in that LM-CFTG and TAG are weakly equivalent; they only differ in whether they must derive the strings projectively or not.

*Theorem 2**.*

$\mathsf{CSD}\not\in{\mathcal{L}}(\mbox{LM-CFTG},\mathcal{H}_{k})$ , for any $k$ .

5.1 Asynchronous derivations

Assume that $\mathsf{CSD}=\mathcal{R}\mathcal{E}\mathcal{L}({\mathcal{G}},h,\mathcal{H}_{k})$ , for some $k$ , with ${\mathcal{G}}=(G,\operatorname{\mathsf{yd}})$ an LM-CFTG. Proving that this is a contradiction hinges on a somewhat technical concept of asynchronous derivations, which have to do with how the nodes generating edge labels such as $\overline{a}$ are distributed over a derivation tree. We prove that all asynchronous derivations of certain elements of $\mathsf{CSD}$ are distant (Lemma 5), and that all LM-CFTG derivations of $\mathsf{CSD}$ are asynchronous (Lemma 6), which proves Thm. 2.

In what follows, Let ${\mathbf{T}}=L(G)$ . let us write for any tree or context $t$ and symbol $x$ , $n^{t}_{x}$ as a shorthand for $|\operatorname{\mathsf{yd}}(t)|_{x}$ , $e^{t}_{x}$ for the number of $x$ -edges in $\operatorname{\mathsf{graph}}(t)$ and $m^{t}_{\overline{x}}$ for the maximum length of a string in $\overline{x}^{*}$ which is also substring of $\operatorname{\mathsf{yd}}(t)$ .

*Definition 1** ( $x,y,l$ -asynchronous derivation).*

*Let $x/y\in\mathsf{Sep}$ , $l>0$ , $t\in{\mathbf{T}}$ , We call $t$ an $x,y,l$ -asynchronous derivation iff there is a decomposition $t=C[t^{\prime}]$ such that *

[TABLE]

We call the pair $(C,t^{\prime})$ an $x,y,l$ -asynchronous split of $t$ .

*Lemma 5**.*

For any $k,l>0$ , there is a pair $o_{k,l}=(w_{k,l},G_{k,l})\in\mathsf{CSD}$ such that every $x,y,l$ -asynchronous $t$ with $o_{k,l}=(\operatorname{\mathsf{yd}}(t),\operatorname{\mathsf{graph}}(t))$ is $k$ -distant.

*Proof 3**.*

For $x\in\{a,b,c,d\}$ and $m\in\mathbb{N}_{0}$ , let $\overline{x}^{(m)}$ denote the word $\langle^{m}\,\overline{x}^{m}\,\rangle^{m}$ . Let $r=s=3l+k+1$ and $o_{k,l}=(w_{k,l},G_{k,l})$ be the unique element of $\mathsf{CSD}$ such that

[TABLE]

Let $t$ be an $a,c,l$ -asynchronous derivation of $o_{k,l}$ ; other choices of $x/y\in\mathsf{Sep}$ are analogous. By definition, we can split $t=C[t^{\prime}]$ such that $\operatorname{\mathsf{graph}}(t^{\prime})$ has at most $q_{a}=lr+(l+1)s=(2l+1)s$ $\overline{a}$ -edges and at least $q_{c}=rs-rl-s=(2l+k+1)s$ $\overline{c}$ -edges. Notice first that $\operatorname{\mathsf{graph}}(t^{\prime})$ contains at most $2l+1$ different complete $a$ -blocks of $G_{k,l}$ , because each $a$ -block contains $s$ $\overline{a}$ -edges. Having $2l+2$ of them would require $(2l+2)s$ $\overline{a}$ -edges, which is more than the $q_{a}$ $\overline{a}$ -edges that $\operatorname{\mathsf{graph}}(t^{\prime})$ can contain.

Next, consider $2l+k$ distinct $a$ -blocks of $G_{k,l}$ . These blocks contain a total of $(2l+k)s<(2l+k+1)s=q_{c}$ $\overline{c}$ -edges. Hence, the $\overline{c}$ -edges of $\operatorname{\mathsf{graph}}(t^{\prime})$ cannot be contained within only $2l+k$ distinct blocks.

So we can find at least $2l+k+1$ $\overline{c}$ -edges in $\operatorname{\mathsf{graph}}(t^{\prime})$ which are pairwise non-equivalent. There are at least $k$ edges among these which are equivalent to an edge in $G_{k,l}\setminus\operatorname{\mathsf{graph}}(t^{\prime})$ , because $\operatorname{\mathsf{graph}}(t^{\prime})$ contains at most $l$ complete $a$ -blocks of $G_{k,l}$ . Thus, $t$ is $k$ -distant.

5.2 LM-CFTG derivations are asynchronous

So far, we have not used the assumption that ${\mathbf{T}}$ is an LM-CFTL. We will now exploit the pumping lemma to show that all derivation trees of an LM-CFTG for $\mathsf{CSD}$ must be asynchronous.

*Lemma 6**.*

If ${\mathbf{T}}$ is an LM-CFTL, then there exists $l_{0}\in\mathbb{N}_{0}$ such that for every $t\in{\mathbf{T}}$ , there exists $x/y\in\mathsf{Sep}$ such that $t$ is $x,y,l_{0}$ -asynchronous.

We prove this lemma by appealing to a class of derivation trees in which predicate and argument tokens are generated in separate parts.

*Definition 2** ( $x,y,l$ -separated derivation).*

Let $x/y\in\mathsf{Sep}$ . A tree $t\in\mathcal{T}_{\Delta}$ is $x,y,l$ -separated if we can write $t=C_{x}[C_{0}[t_{y}]]$ such that $|\operatorname{\mathsf{yd}}(t_{y})|_{x}=0$ and $|\operatorname{\mathsf{yd}}(C_{x})|_{y}=0$ and $|\operatorname{\mathsf{yd}}(C_{0})|_{x}\leq l$ . The triple $(C_{x},C_{0},t_{y})$ is called an $l$ -separation of $t$ . We call an $l$ -separation minimal if there is no other $l$ -separation of $t$ with a smaller $C_{0}$ .

Intuitively, we can use the pumping lemma to systematically remove some contexts from a $t\in{\mathbf{T}}$ . From the shape of $\mathsf{CSD}$ , we can conclude certain alignments between the strings and graphs generated by these contexts and establish bounds on the number of $\overline{x}$ - and $\overline{y}$ -edges generated by the lower part of a separated derivation. The full proof is in the appendix; we sketch the main ideas here.

Let $p$ denote the pumping height of ${\mathbf{T}}$ . There is a maximal number of string tokens and edges that a context of height at most $p$ can generate under a given yield and homomorphism. We call this number $l_{0}$ in the rest of the proof.

*Lemma 7**.*

For $t\in{\mathbf{T}}$ , let $r^{t}_{\overline{y}}$ be the length of the maximal substring of $\operatorname{\mathsf{yd}}(t)$ consisting in only $\overline{y}$ -tokens and containing the rightmost occurrence of $\overline{y}$ in $\operatorname{\mathsf{yd}}(t)$ . If $t$ is $x,y,l_{0}$ -separated, there exists a minimal $x,y,l_{0}$ -separation $D_{x}[D_{0}[t_{y}]]$ of $t$ such that, letting $t_{0}=D_{0}[t_{y}]$ , $e^{t_{0}}_{\overline{y}}\geq n^{t}_{\overline{y}}-n^{t}_{x}l_{0}-r^{t}_{\overline{y}}$ .

Moreover, for any $x,y,l_{0}$ -separation $t=E_{x}[E_{0}[t^{1}_{y}]]$ , letting $t_{1}=E_{0}[t^{1}_{y}]$ , $e^{t_{1}}_{\overline{x}}\leq n^{t_{1}}_{\overline{x}}+n^{t}_{x}l_{0}$ .

*Proof 4** (sketch).*

Both statements must be achieved in separated inductions on the height of $t$ , although they mostly follow similar steps. We therefore focus here only on the crucial parts of the (slightly trickier) bound on $e^{t_{0}}_{\overline{y}}$ . Let $D_{x}[D_{0}[t_{y}]]$ be a minimal $x,y,l_{0}$ -separation of $t$ and $t_{0}=D_{0}[t_{y}]$ .

Base Case If $\operatorname{\mathsf{ht}}(t)\leq p$ , we have $n^{t}_{\overline{y}}\leq l_{0}$ . We also have $n^{t}_{x}>0$ , so $n^{t}_{\overline{y}}-n^{t}_{x}l_{0}-r^{t}_{\overline{y}}\leq 0\leq e^{t_{0}}_{\bar{y}}$ .

Induction step If $h(t)>p$ , we apply Lemma 1 to $t$ to yield a decomposition $t=C_{1}[C_{2}[C_{3}[C_{4}[t_{5}]]]]$ , where $t^{\prime}=C_{1}[C_{3}[t_{5}]]\in{\mathbf{T}}$ , $\operatorname{\mathsf{ht}}(t^{\prime})<\operatorname{\mathsf{ht}}(t)$ and $\operatorname{\mathsf{ht}}(C_{2}[[C_{3}]C_{4}])\leq p$ . We first observe that $t^{\prime}$ is $x,y,l_{0}$ -separated. By induction, there exists a minimal separation $t^{\prime}=C_{x}[C_{0}[t^{\prime}_{y}]]$ with $t_{0}^{\prime}=C_{0}[t^{\prime}_{y}]$ validating the bound on $e^{t^{\prime}_{0}}_{\overline{y}}$ . Because of pumping considerations, we need to distinguish only three configurations of $C_{2}$ and $C_{4}$ . We present only the most difficult case here.

*In this case $C_{2}$ and $C_{4}$ generate only one kind of bar symbol, $\overline{y}$ , and brackets. One needs to examine all possible ways $C_{2}$ , $C_{4}$ and $t_{0}$ may overlap. We detail the reasoning in the case where $t_{0}$ does not overlap with $C_{2}$ or $C_{4}$ . Then, since all $y$ -tokens are generated by $t_{0}$ , projectivity of the yield and the definition of $\mathsf{CSD}$ impose that the generated $\overline{y}$ -tokens contribute to the rightmost $y$ -chain i.e. $r^{t}_{\overline{y}}=r^{t^{\prime}}_{\overline{y}}+n^{C_{2}[C_{4}]}_{\overline{y}}$ . Hence $e^{t_{0}}_{\overline{y}}\geq e^{t^{\prime}_{0}}_{\overline{y}}\geq n^{t}_{\overline{y}}-n^{C_{2}[C_{4}]}_{\overline{y}}+n^{t}_{x}l_{0}-r^{t_{0}}_{\overline{y}}+n^{C_{2}[C_{4}]}_{\overline{y}}$ . *

*Lemma 8**.*

For any $t\in{\mathbf{T}}$ , if $t$ is $x,y,l_{0}$ -separated then $t$ is $x,y,l_{0}$ -asynchronous.

*Proof 5**.*

By Lemma 7 there is a minimal $x,y,l_{0}$ -separation $t=D_{x}[D_{0}[t_{y}]]$ such that, for $t_{0}=D_{0}[t_{y}]$ , the bound on $e^{t_{0}}_{\overline{y}}$ and the bound on $e^{t_{0}}_{\overline{x}}$ both obtain. Observe that $r^{t}_{\overline{y}}\leq m^{t}_{\overline{y}}$ by definition, and since $t_{0}$ generates at most $l_{0}$ $x$ -tokens, by projectivity it generates at most $(l_{0}+1)m^{t}_{\overline{x}}$ $\overline{x}$ -tokens (one sequence of $m^{t}_{\overline{x}}$ between each occurrence of $x$ and the next, plus possibly one before the first and one after the last). Thus $t$ is $x,y,l_{0}$ -asynchronous.

*Lemma 9**.*

For any $t\in{\mathbf{T}}$ , $t$ is $x,y,l_{0}$ -separated for some $x/y\in\mathsf{Sep}$ .

*Proof 6** (sketch).*

The proof proceeds by induction on the height of $t$ .

If $\operatorname{\mathsf{ht}}(t)\leq p$ , then $|\operatorname{\mathsf{yd}}(t)|_{z}\leq l_{0}$ for any $z\in\{a,b,c,d\}$ , hence $t$ is trivially $x,y,l_{0}$ -separated for some $x/y\in\mathsf{Sep}$ .

If $h(t)>p$ , Lemma 1 yields a decomposition $t=C_{1}[C_{2}[C_{3}[C_{4}[t_{5}]]]]$ , where $t^{\prime}=C_{1}[C_{3}[t_{5}]]\in{\mathbf{T}}$ , $\operatorname{\mathsf{ht}}(t^{\prime})<\operatorname{\mathsf{ht}}(t)$ and $\operatorname{\mathsf{ht}}(C_{2}[C_{3}[C_{4}]])\leq p$ . By induction $t^{\prime}$ is $x,y,l_{0}$ -separated for some $x/y\in\mathsf{Sep}$ . Let us assume $x/y=a/c$ , other cases are analoguous. The challenge is to conclude to the $x,y,l$ separation of $t$ , after reinsertion of $C_{2}$ and $C_{4}$ in $t^{\prime}$ .

If $C_{2}$ and $C_{4}$ generate no $a$ - or $c$ -token, the distribution of $a$ - and $c$ -tokens in the tree is not affected, hence $t$ is $a,c,l_{0}$ -separated. Otherwise, due to pumping considerations, we need to distinguish three possible configurations regarding the shape of the yields of $C_{2}$ and $C_{4}$ . We present one here, see the appendix for the others; they are in the same spirit.

We consider the case where $\operatorname{\mathsf{left}}(C_{2})$ contains some $a$ -token and no $b,c,d$ -tokens, and $\operatorname{\mathsf{yd}}(C_{4})$ contains some $c$ -token. Assume $\operatorname{\mathsf{left}}(C_{4})$ contains some $c$ . It follows that all $b$ -tokens are generated by $C_{3}$ . So $t$ has less than $l_{0}$ $b$ -tokens, by definition of $\mathsf{CSD}$ it has then also less than $l_{0}$ $d$ -tokens, so $(C_{1},C_{2}[C_{3}[C_{4}]],t_{5})$ is a $d,b,l_{0}$ -separation. Assume now that $\operatorname{\mathsf{right}}(C_{4})$ contains some $c$ . It follows that $t_{5}$ generate no $d$ -token and $C_{1}$ generate no $b$ -token. Hence $(C_{1},C_{2}[C_{3}[C_{4}]],t_{5})$ is a $b,d,l_{0}$ -separation.

This concludes the proof of Lemma 6 and Thm. 2.

6 Conclusion

We have established a notion of expressive capacity in compositional semantic parsing. We have proved that non-projective grammars can express sentence-meaning relations with bounded memory that projective ones cannot. This answers an old question in the design of compositional systems: assuming projective syntax, lambda-style compositional mechanisms can be more expressive than unification-style ones, which have bounded “memory” for unfilled arguments.

From a theoretical perspective, the stronger result of this paper is perhaps Thm. 2, which shows without further assumptions that weakly equivalent grammar formalisms can differ in their semantic expressive capacity. However, Thm. 1 may have a clearer practical impact on the development of compositional semantic parsers. Consider, for instance, the case of CCG, a lexicalized grammar formalism that has been widely used in semantic parsing Bos (2008); Artzi et al. (2015); Lewis et al. (2016). While a potentially infinite set of syntactic categories can be used in the parses of a single CCG grammar, CCG derivations are still projective in our sense. Thus, if one assumes that derivations should be aligned (which is natural for a lexicalized grammar), Thm. 1 implies that CCG with lambda-style semantic composition is more semantically expressive than with unification-style composition. Indeed, lambda-style compositional mechanisms are the dominant approach in CCG Steedman (2001); Baldridge and Kruijff (2002); Artzi et al. (2015).

Furthermore, under the alignment assumptions of Section 4, no unification-style compositional mechanism can describe string-meaning relations like $\mathsf{CSD}$ . This includes neural models. For instance, most transition-based parsers Nivre (2008); Andor et al. (2016); Dyer et al. (2016) are projective, in that the parsing operations can only concatenate two substrings on the top of the stack if they are adjacent in the string. Such transition systems can therefore not be extended to transition-based semantic parsers Damonte et al. (2017) without (a) losing expressive capacity, (b) giving up compositionality, (c) adding mechanisms for non-projectivity Gómez-Rodríguez et al. (2018), or (d) using a lambda-style semantic algebra. Thus our result clarifies how to build an effective and accurate semantic parser.

We have focused on whether a grammar formalism is projective or not, while holding the semantic algebra fixed. In the future, it would be interesting to explore how a unification-style compositional mechanism can be converted to a lambda-style mechanism with unbounded placeholders. This would allow us to specify and train semantic parsers using such abstractions, while benefiting from the efficiency of projective parsers.

Acknowledgments

We are grateful to Emily Bender, Guy Emerson, Meaghan Fowlie, Jonas Groschwitz, and the participants of the DELPH-IN workshop 2018 for fruitful discussions, and to the anonymous reviewers for their insightful feedback.

Appendix A Details of the proof of Theorem 1

*Lemma 4**.*

Let ${\mathcal{G}}=(G,\operatorname{\mathsf{yd}})$ be a projective string grammar. For any $r\in\mathbb{N}_{0}$ there exists $s\in\mathbb{N}_{0}$ such that any $t\in L(G)$ with $\operatorname{\mathsf{yd}}(t)\in a^{*}b^{s}c^{s}d^{*}$ has a subtree $t^{\prime}$ such that $\operatorname{\mathsf{yd}}(t^{\prime})$ contains $r$ occurrences of $x$ and no occurrences of $y$ , for some $x/y\in\mathsf{Sep}$ .

*Proof 7**.*

Depending on $\operatorname{\mathsf{yd}}$ , one can always choose $s>r$ such that any $t$ with $|\operatorname{\mathsf{yd}}(t)|>2s$ has at least one strict subtree $t^{\prime}$ with $|\operatorname{\mathsf{yd}}(t^{\prime})|\geq 2r$ .

The lemma follows by induction over the height of $t$ . It is trivially true for height 1. For the induction step, consider that $w^{\prime}=\operatorname{\mathsf{yd}}(t^{\prime})$ must have at least $r$ occurrences of some letter because of projectivity and the shape of $\operatorname{\mathsf{yd}}(t)$ ; assume it is $a$ , the other cases are analogous. If $w^{\prime}$ has no occurrences of $c$ , we are done. Otherwise, by projectivity, $w^{\prime}$ contains all the $b$ ’s, i.e. $w^{\prime}\in a^{*}b^{s}c^{+}d^{*}$ . In this case, either $w^{\prime}$ contains $s>r$ occurrences of $b$ and no occurrences of $d$ , in which case we are again done. Or it contains an occurrence of $d$ ; then $w^{\prime}\in a^{*}b^{s}c^{s}d^{*}$ is in the shape required by the lemma, and we can apply the induction hypothesis to identify a subtree $t^{\prime\prime}$ of $t^{\prime}$ with $r$ occurrences of some $x$ and none of the corresponding $y$ ; and $t^{\prime\prime}$ is also a subtree of $t$ .

Appendix B Details of the proof of Theorem 2

In all of the following, we assume that for some $k\in\mathbb{N}_{0}$ we have $\mathsf{CSD}=\mathcal{R}\mathcal{E}\mathcal{L}({\mathcal{G}},h,\mathcal{H}_{k})$ , where ${\mathcal{G}}=(G,\operatorname{\mathsf{yd}})$ is an LM-CFTG (hence projective, i.e. $\operatorname{\mathsf{yd}}=\operatorname{\operatorname{\mathsf{yd}}_{pr}}$ ). We let ${\mathbf{T}}=L(G)$ and $p$ be the pumping height of ${\mathbf{T}}$ .

B.1 Terminology

Let us extend the domain of $\operatorname{\mathsf{yd}}$ to contexts: for a context $C$ , we let $\operatorname{\mathsf{yd}}(C)=\operatorname{\mathsf{left}}(C)\cdot\operatorname{\mathsf{right}}(C)$ .

We say that a string $s$ is balanced if, for any $z\in\{\overline{a},\overline{b},\overline{c},\overline{d}\}$ and any position $i$ in $s$ such that $s_{i}=z$ there are two encompassing positions $k\leq i\leq l$ such that $s_{[k,l]}\in\{\langle^{n}\overline{z}\rangle^{n}\mid n\in\mathbb{N}_{0}\}$ . We say that a tree or a context is balanced if its yield is balanced. By construction, all trees of ${\mathbf{T}}$ are balanced.

For $t\in{\mathbf{T}}$ , a pumping decomposition of $t$ is a $5$ -tuple $(C_{1},C_{2},C_{3},C_{4},t_{5})$ , consisting in $4$ contexts $C_{1}$ - $C_{4}$ and one tree $t_{5}$ such that $t=C_{1}[C_{2}[C_{3}[C_{4}[t_{5}]]]]$ , $\operatorname{\mathsf{ht}}(C_{2}[C_{3}[C_{4}]])\leq p$ , $\operatorname{\mathsf{ht}}(C_{2})+\operatorname{\mathsf{ht}}(C_{4})>0$ and for any $i\in\mathbb{N}_{0}$ , $C_{1}[v^{i}[t_{5}]]\in{\mathbf{T}}$ , where we let $v^{0}=C_{3}$ and $v^{i+1}=C_{2}[v^{i}[C_{4}[X]]].$

B.2 Pumping considerations

*Lemma 10**.*

Let $t\in{\mathbf{T}}$ with $\operatorname{\mathsf{ht}}(t)>p$ , and consider a pumping decomposition $t=C_{1}[C_{2}[C_{3}[C_{4}[t_{5}]]]]$ . Let $s=\operatorname{\mathsf{left}}(C_{2})\cdot\operatorname{\mathsf{left}}(C_{4})\cdot\operatorname{\mathsf{right}}(C_{4})\cdot\operatorname{\mathsf{right}}(C_{2})=\operatorname{\mathsf{yd}}(C_{2}[C_{4}])$ . The two following propositions obtain:

•

For any $(x,y)\in\{(a,c),(b,d)\}$ , $|s|_{x}=|s|_{y}$ .

•

Let $t^{\prime}=C_{1}[C_{3}[t_{5}]]$ . For $u\in\mathcal{T}_{\Delta}$ and $z\in\{a,b,c,d,\overline{a},\overline{b},\overline{c},\overline{d}\}$ let $e^{u}_{z}$ denote the number of $z$ -edges in $\operatorname{\mathsf{graph}}(u)$ . It holds for any $z\in\{a,b,c,d,\overline{a},\overline{b},\overline{c},\overline{d}\}$ that $e^{t}_{z}=e^{t^{\prime}}_{z}+|s|_{z}$ .

*Proof 8**.*

Let $(x,y)\in\{(a,c),(b,d)\}$ . $t\in{\mathbf{T}}$ so $\operatorname{\mathsf{yd}}(t)\in\mathsf{CSD}_{s}$ which entails

[TABLE]

since $t^{\prime}\in{\mathbf{T}}$ by construction, we have $\langle\operatorname{\mathsf{yd}}(t^{\prime}),\operatorname{\mathsf{graph}}(t^{\prime})\rangle\in\mathsf{CSD}$ . From there

[TABLE]

But $|\operatorname{\mathsf{yd}}(t)|_{x,y}=|\operatorname{\mathsf{yd}}(t^{\prime})|_{x,y}+|s|_{x,y}$ . Plugging this into (1) yields

[TABLE]

Simplifying using (2) we find $|s|_{x}=|s|_{y}$ which establishes the first point. For the second point, we have from $\langle\operatorname{\mathsf{yd}}(t),\operatorname{\mathsf{graph}}(t)\rangle\in\mathsf{CSD}$ and by definition of $\mathsf{CSD}$ $e^{t}_{z}=|\operatorname{\mathsf{yd}}(t)|_{z}=|\operatorname{\mathsf{yd}}(t^{\prime})|_{z}+|s|_{z}$ . Similarily since $\langle\operatorname{\mathsf{yd}}(t^{\prime}),\operatorname{\mathsf{graph}}(t^{\prime})\rangle\in\mathsf{CSD}$ we have $e^{t^{\prime}}_{z}=|\operatorname{\mathsf{yd}}(t^{\prime})|_{z}$ . Hence $e^{t}_{z}=e^{t^{\prime}}_{z}+|s|_{z}$ .

We will now present a pair of lemmas stating, in formal terms, that decompositions $t=C_{1}[C_{2}[C_{3}[C_{4}[t_{5}]]]]$ provided by the pumping lemma all fall within a small number of configurations:

•

First, in the case where the ’pumpable’ contexts $C_{2}$ and $C_{4}$ generate only ‘bar’ tokens and brackets in $\{\overline{a},\overline{b},\overline{c},\overline{d},\langle,\rangle\}^{*}$ , we show that $\operatorname{\mathsf{yd}}(C_{2})\in\{\langle,\rangle\}^{*}$ , so that only $C_{4}$ is actually pumping ‘bar’ tokens of some kind. Moreover, $t_{5}$ generates only ’bar’ tokens and brackets as well.

•

Second, we explore the alternative, where the ’pumpable’ contexts generate some of the ‘core’ tokens in $\{a,b,c,d\}$ , say – for the sake of this informal presentation – some $a$ -tokens. By lemma 10, they must generate as many $c$ -tokens, for which we can again distinguish three possible configurations: 1. $a$ ’s and $c$ ’s are respectively generated on different sides of a single context ( $C_{2}$ and/or $C_{4}$ ), but then neither $C_{2}$ nor $C_{4}$ generate any $b$ or $d$ -tokens. 2. $C_{2}$ generate both $a$ and $d$ -tokens (on the left and right sides respectively) and no $b$ and $c$ -tokens, while $C_{4}$ ensures generation of corresponding $b$ and $c$ -tokens (on the left and right sides respectively). 3. Or else, one of $C_{2},C_{4}$ generates the $a$ -tokens and no $c$ , $b$ or $d$ while the other generates the corresponding $c$ -tokens and no $a$ , $b$ or $d$ .

Below follows the formal presentation of these lemmas:

*Lemma 11**.*

Let $t\in{\mathbf{T}}$ with $\operatorname{\mathsf{ht}}(t)>p$ , and consider a pumping decomposition $t=C_{1}[C_{2}[C_{3}[C_{4}[t_{5}]]]]$ such that for all $z\in\{a,b,c,d\}$ , $|\operatorname{\mathsf{yd}}(C_{2}[C_{4}])|_{z}=0$ . There is $\overline{x}\in\{\overline{a},\overline{b},\overline{c},\overline{d}\}$ such that all of the following holds:

$\operatorname{\mathsf{yd}}(C_{2})\in\{\langle,\rangle\}^{*}$ * and $\operatorname{\mathsf{yd}}(C_{4})\in\{\langle,\overline{x},\rangle\}^{*}$ .* 2. 2.

Either $\operatorname{\mathsf{left}}(C_{4})\in\{\overline{x}\}^{*}$ and $|\operatorname{\mathsf{left}}(C_{3})|_{z}=0$ for any $z\in\{a,b,c,d\}$ , or symmetrically, $\operatorname{\mathsf{right}}(C_{4})\in\{\overline{x}\}^{*}$ and $|\operatorname{\mathsf{right}}(C_{3})|_{z}=0$ for any $z\in\{a,b,c,d\}$ . 3. 3.

$|\operatorname{\mathsf{yd}}(t_{5})|_{z}=0$ * for any $z\in\{a,b,c,d\}$ *

*Proof 9**.*

First point: Let $s=\operatorname{\mathsf{left}}(C_{1})$ and $n_{0}=|s|_{\langle}$ . Let $y\in\{\overline{a},\overline{b},\overline{c},\overline{d}\}$ and assume $y\in\operatorname{\mathsf{left}}(C_{2})$ . Pumping $C_{2}$ - $C_{4}$ $n_{0}+1$ -times yields a tree $t_{n_{0}+1}\in{\mathbf{T}}$ such that $s\cdot\operatorname{\mathsf{left}}(C_{2})^{n_{0}+1}$ is a prefix of $\operatorname{\mathsf{yd}}(t_{n_{0}+1})$ . We thus see that $t_{n_{0}+1}$ is not balanced, which is in contradiction with $t_{n_{0}+1}\in{\mathbf{T}}$ . A symmetric argument establishes that $y\notin\operatorname{\mathsf{right}}(C_{2})$ .

Assume now that there are two distinct $\overline{x},\overline{y}\in\{\overline{a},\overline{b},\overline{c},\overline{d}\}$ such that $\overline{x}\in\operatorname{\mathsf{yd}}(C_{4})$ and $\overline{y}\in\operatorname{\mathsf{yd}}(C_{4})$ . Notice that, since $C_{4}$ does not contain non-bar tokens, if $\overline{x}$ and $\overline{y}$ occur on the same side of $C_{4}$ (for instance $\operatorname{\mathsf{left}}(C_{4})=\langle\overline{x}\rangle\langle\overline{y}\rangle$ ) then $t\notin{\mathbf{T}}$ because no string in $\mathsf{CSD}_{S}$ admits $\operatorname{\mathsf{left}}(C_{4})$ as a substring, whereas $\operatorname{\mathsf{yd}}(t)$ does. So $\overline{x}$ and $\overline{y}$ must occur on distinct sides. It follows that $C_{4}$ does not generate tokens in $\{\langle,\rangle\}$ either: if for instance $\operatorname{\mathsf{left}}(C_{4})=u\cdot{\langle}\cdot\overline{x}\cdot v$ for some strings $u$ and $v$ in $\{\overline{x}\langle,\rangle\}^{*}$ , $u\cdot{\langle}\cdot\overline{x}\cdot v\cdot u\cdot{\langle}\cdot\overline{x}\cdot v$ would be a substring of $C_{1}[C_{2}[C_{2}[C_{3}[C_{4}[C_{4}[t_{5}]]]]]]\in{\mathbf{T}}$ which again is a contradiction. Let now $n_{1}=|\operatorname{\mathsf{yd}}(t_{5})|_{\rangle}$ . Pumping $C_{2}$ - $C_{4}$ $n_{1}+1$ times yields a tree $t_{n_{1}+1}\in{\mathbf{T}}$ with a substring of the form $\overline{x}^{n_{1}+1}\operatorname{\mathsf{yd}}(t_{5})\overline{y}^{n_{1}+1}$ (up to $x/y$ symmetry) which cannot be balanced, yielding a final contradiction.

Second point: $\operatorname{\mathsf{yd}}(C_{4})\notin\{\langle,\rangle\}^{*}$ , because otherwise pumping $C_{2}$ and $C_{4}$ more times than the maximum number of occurrences of a bar token in $\operatorname{\mathsf{yd}}(t)$ would yield an unbalanced tree. So there is a $\overline{x}$ such that $\overline{x}\in\operatorname{\mathsf{left}}(C_{4})$ or $\overline{x}\in\operatorname{\mathsf{right}}(C_{4})$ . Assume for contradiction that any different token occurs on the same side of $C_{4}$ then $C_{1}[C_{2}[C_{2}[C_{3}[C_{4}[C_{4}[t_{5}]]]]]]\in{\mathbf{T}}$ contains a substring that cannot be found in any string of $\mathsf{CSD}$ yielding a contradiction. So $\operatorname{\mathsf{left}}(C_{4})\in\{\overline{x}\}^{*}$ or $\operatorname{\mathsf{right}}(C_{4})\in\{\overline{x}\}^{*}$ . Assume $\operatorname{\mathsf{left}}(C_{4})\in\{\overline{x}\}^{*}$ , the other case is symmetric. Assume for contradiction that $|\operatorname{\mathsf{left}}(C_{3})|_{z}>0$ for some $z\in\{a,b,c,d\}$ . Let $n_{2}=|\operatorname{\mathsf{yd}}(C_{3})|_{\langle}$ . Pumping $C_{2}$ - $C_{4}$ $n_{2}+1$ times yields a tree $t^{n_{2}+1}\in{\mathbf{T}}$ such that (by projectivity) $\operatorname{\mathsf{yd}}(t^{n_{2}+1})$ has a substring of the form $z\cdot u\cdot\overline{x}^{n_{2}+1}$ where $|u|_{\langle}\leq n_{2}$ . Hence $t^{n_{2}+1}\in{\mathbf{T}}$ is not balanced, yielding a contradiction.

Third point: Assume for contradiction that $|\operatorname{\mathsf{yd}}(t_{5})|_{z}>0$ . Assume that $\operatorname{\mathsf{left}}(C_{4})\in\{\overline{x}\}^{*}$ , the case $\operatorname{\mathsf{right}}(C_{4})\in\{\overline{x}\}^{*}$ is symmetric, and point 2 ensures that these two cases are exhaustive. Let $n_{3}=|t_{5}|_{\rangle}$ and consider the tree $t^{n_{3}+1}\in{\mathbf{T}}$ obtained by pumping $C_{2}$ - $C_{4}$ $n_{3}+1$ times. By projectivity, $\operatorname{\mathsf{yd}}(t^{n_{3}+1})$ has a substring of the form $\overline{x}^{k}\cdot u\cdot z\cdot v$ with $k\geq n_{3}+1$ and $|u|_{\rangle}\leq n_{3}$ . Hence $t^{n_{3}+1}$ is not balanced and $t^{n_{3}+1}\notin{\mathbf{T}}$ , yielding a contradiction.

*Lemma 12**.*

let $t\in{\mathbf{T}}$ with $\operatorname{\mathsf{ht}}(t)>p$ , and consider a pumping decomposition $t=C_{1}[C_{2}[C_{3}[C_{4}[t_{5}]]]]$ . Let $(x,y,X,Y)\in\{(a,c,A,C),(b,d,B,D)\}$ such that $|\operatorname{\mathsf{yd}}(C_{2}[C_{4}])|_{x}\neq 0$ . One of the following obtains:

For some $(i,j)\in\{(2,4),(4,2)\}$ , $\operatorname{\mathsf{left}}(C_{i})\in X^{+}$ , $\operatorname{\mathsf{right}}(C_{i})\in Y^{+}$ , $\operatorname{\mathsf{left}}(C_{j})\in X^{*}$ and $\operatorname{\mathsf{right}}(C_{j})\in Y^{*}$ . 2. 2.

$\operatorname{\mathsf{left}}(C_{2})\in A^{+}$ , $\operatorname{\mathsf{right}}(C_{2})\in D^{+}$ , $\operatorname{\mathsf{left}}(C_{j})\in B^{+}$ and $\operatorname{\mathsf{right}}(C_{j})\in C^{+}$ . 3. 3.

Either $\operatorname{\mathsf{left}}(C_{2})\in X^{+}$ , $\operatorname{\mathsf{right}}(C_{2})=\epsilon$ and $\operatorname{\mathsf{left}}(C_{4})\cdot\operatorname{\mathsf{right}}(C_{4})\in Y^{+}$ , or symmetrically $\operatorname{\mathsf{left}}(C_{2})=\epsilon$ , $\operatorname{\mathsf{right}}(C_{2})\in Y^{+}$ and $\operatorname{\mathsf{left}}(C_{4})\cdot\operatorname{\mathsf{right}}(C_{4})\in X^{+}$ .

*Proof 10**.*

All these observations follow easily from the first point of Lemma 10 (governing the relative number of occurrences of $a,c$ -tokens on one hand and $b,d$ -tokens on the other hand), projectivity, and the following observation: only one side of $C_{2}$ or $C_{4}$ cannot generate two different kinds of tokens in $\{a,b,c,d\}$ or be unbalanced. Otherwise pumping would (from projectivity) ensure that the resulting tree has a substring of a shape impossible for $\mathsf{CSD}$ (for example, if both $a$ and $b$ -tokens occur on the same side of $C_{2}$ , pumping once produces a substring $a\cdot u\cdot b\cdot v\cdot a\cdot u\cdot b\cdot v$ ).

B.3 Separation

*Lemma 13**.*

Let $t=C_{1}[C_{2}[C_{3}[C_{4}[t_{5}]]]]\in{\mathbf{T}}$ and $t^{\prime}=C_{1}[C_{3}[t_{5}]]\in{\mathbf{T}}$ . If $t$ is $x,y,l$ -separated then so is $t^{\prime}$ .

*Proof 11**.*

Consider an $x,y,l$ -separation of $t$ : $t=D_{x}[D_{0}[t_{y}]]$ . Let $C_{x},C_{0}$ and $t^{\prime}_{y}$ be respectively obtained by removing all nodes from $C_{2}$ or $C_{4}$ from $D_{x}$ , $D_{0}$ and $t_{y}$ . One easily checks that $t^{\prime}=C_{x}[C_{0}[t^{\prime}_{y}]]$ .

Moreover, $|\operatorname{\mathsf{yd}}(C_{x})|_{y}\leq|\operatorname{\mathsf{yd}}(D_{x})|_{y}=0$ , $|\operatorname{\mathsf{yd}}(C_{0})|_{x}\leq|\operatorname{\mathsf{yd}}(D_{0})|_{x}\leq l$ and $|\operatorname{\mathsf{yd}}(t^{\prime}_{y})|_{x}\leq|\operatorname{\mathsf{yd}}(t_{y})|_{x}=0$ . Hence $t^{\prime}$ is $x,y,l$ -separated.

B.4 Minimality argument

*Lemma 14**.*

Let $t=C_{1}[C_{2}[C_{3}[C_{4}[t_{5}]]]]\in{\mathbf{T}}$ and $t^{\prime}=C_{1}[C_{3}[t_{5}]]\in{\mathbf{T}}$ such that $t$ is $x,y,l$ -separated. By Lemma 13, $t^{\prime}$ is separated. Let $D_{x}[D_{0}[t_{y}]]$ be a minimal separation of $t$ and $C_{x}[C_{0}[t^{\prime}_{y}]]$ be a minimal separation of $t^{\prime}$ . $D_{0}[t_{y}]$ contains all nodes of $C_{0}[t^{\prime}_{y}]$ .

*Proof 12**.*

Assume for contradiction that a node $\pi$ of $C_{0}$ is not in $D_{0}$ . It must then be in $D_{x}$ or $t_{y}$ . Assume that it is in $D_{x}$ , the case where it is in $t_{y}$ is analoguous. Since $\pi$ is not in $D_{0}$ , there is a non-trivial subcontext $D^{\prime}_{x}$ of $D_{x}$ rooted at $\pi$ , i.e. $D_{x}=D^{\prime\prime}_{x}[D^{\prime}_{x}]$ with $\operatorname{\mathsf{ht}}(D^{\prime}_{x})>0$ . Let $C^{\prime\prime}_{x},C^{\prime}_{x}$ be obtained by removing all nodes from $C_{2}$ or $C_{4}$ from $D^{\prime\prime}_{x}$ and $D^{\prime}_{x}$ respectively. By definition of $D_{x}$ , $|\operatorname{\mathsf{yd}}(D^{\prime\prime}_{x}[D^{\prime}_{x}])|_{y}=0$ , hence $|\operatorname{\mathsf{yd}}(C^{\prime\prime}_{x}[C^{\prime}_{x}])|_{y}=0$ . Further observe that we have $C_{x}[C_{0}]=C^{\prime\prime}_{x}[C^{\prime}_{x}[C^{\prime}_{0}]]$ for some subcontext $C^{\prime}_{0}$ of $C_{0}$ . Since $\pi$ is not in $C_{2}$ or $C_{4}$ , $\operatorname{\mathsf{ht}}(C^{\prime}_{x})>0$ thus $\operatorname{\mathsf{ht}}(C^{\prime}_{0})<\operatorname{\mathsf{ht}}(C_{0})$ . But letting $E_{x}=C^{\prime\prime}_{x}[C^{\prime}_{x}]$ , $E_{x}[C^{\prime}_{0}[t^{\prime}_{y}]]$ is then an $x,y,l$ -separation of $t$ which contredicts the assumed minimality of $C_{x}[C_{0}[t^{\prime}_{y}]]$ .

B.5 Inductive bounds

For any tree or context $t$ and symbol $x$ , let us write $n^{t}_{x}$ as a shorthand for $|\operatorname{\mathsf{yd}}(t)|_{x}$ , $e^{t}_{x}$ for the number of $x$ -edges generated by $t$ and $r^{t}_{x}$ the length of the rightmost maximal substring of $\operatorname{\mathsf{yd}}(t)$ consisting in only $x$ -tokens (more formally, $r^{t}_{x}=|s|_{x}$ , where $s$ is the unique substring such that $\operatorname{\mathsf{yd}}(t)=u\cdot s\cdot v$ where $s\in x^{*}$ , if $u$ is non empty its last token is not $x$ , and $|v|_{x}=0$ ).

There is a maximal number of string tokens and edges that a context of height at most $p$ can generate under the considered yield and homomorphism. We call $l_{0}$ this number and focus from now on $l_{0}$ -separated and $l_{0}$ -asynchronous derivations.

Below are the proofs of the two statements of Lemma 7 of the main paper (respectively, 7-1 and 7-2).

*Lemma 7-1**.*

If $t\in{\mathbf{T}}$ is $x,y,l_{0}$ -separated and $t=D_{x}[D_{0}[t_{y}]]$ is an $x,y,l_{0}$ -separation of $t$ , then for $t_{0}=D_{0}[t_{y}]$ we have

[TABLE]

*Proof 13**.*

We prove the result by induction over the pair $(\operatorname{\mathsf{ht}}(t_{0}),\operatorname{\mathsf{ht}}(t))$ (with lexicographic ordering).

Base Case Assume $\operatorname{\mathsf{ht}}(t_{0})\leq p$ . Then $e^{t_{0}}\leq l_{0}$ . Since $\operatorname{\mathsf{yd}}(t)\in\mathsf{CSD}_{s}$ , $n^{t}_{x}>0$ , thus $n^{t_{0}}_{\overline{x}}+n^{t}_{x}l_{0}\geq l_{0}$ which ensures the bound.

Induction step If $h(t_{0})>p$ then $h(t)\geq h(t_{0})>p$ . We apply Lemma 1 to $t$ to yield a decomposition $t=C_{1}[C_{2}[C_{3}[C_{4}[t_{5}]]]]$ , where $t^{\prime}=C_{1}[C_{3}[t_{5}]]\in{\mathbf{T}}$ , $\operatorname{\mathsf{ht}}(t^{\prime})<\operatorname{\mathsf{ht}}(t)$ and $\operatorname{\mathsf{ht}}(C_{2}[C_{3}[C_{4}]]])\leq p$ . Notice that $t_{0}$ cannot overlap with $C_{2}[C_{3}[C_{4}]$ without overlapping with $C_{1}$ or $t_{5}$ as well, for otherwise $h(t_{0})\leq p$ .

As in the proof of Lemma 13, letting $C_{x},C_{0},t^{\prime}_{y}$ be obtained by removing all nodes from $C_{2}$ and $C_{4}$ from $D_{x}$ , $D_{0}$ and $t_{y}$ respectively, we obtain an $x,y,l$ -separation $t^{\prime}=C_{x}[C_{0}[t^{\prime}_{y}]]$ . We let $t^{\prime}_{0}=C_{0}[t^{\prime}_{y}]$ and distinguish between possible configurations for $C_{2}$ and $C_{4}$ :

Case 0 If neither $C_{2}$ or $C_{4}$ generate any $\overline{x}$ -token, we find by induction

[TABLE]

Moreover, we have $e^{t^{\prime}_{0}}_{\overline{x}}=e^{t_{0}}_{\overline{x}}$ , $n^{t^{\prime}_{0}}_{\overline{x}}=n^{t_{0}}_{\overline{x}}$ and $n^{t^{\prime}}_{x}\leq n^{t}_{x}$ which concludes.

Case 1 In this case Lemma 11 applies i.e. $C_{2}$ and $C_{4}$ generate only some $\overline{z}$ -tokens and brackets. The only subcase not already covered by Case 0 is the one where $\overline{z}=\overline{x}$ . Notice that $n^{t^{\prime}}_{x}=n^{t}_{x}$ . By induction,

[TABLE]

If $t_{0}$ does not overlap with $C_{2}$ or $C_{4}$ , we have $e^{t^{\prime}_{0}}_{\overline{x}}=e^{t_{0}}_{\overline{x}}$ and $n^{t^{\prime}_{0}}_{\overline{x}}=n^{t_{0}}_{\overline{x}}$ which ensures the bound. Otherwise $t_{0}$ overlaps with $C_{4}$ . If all nodes of $t_{0}$ are contained in $C_{4}[t_{5}]$ , then by Lemma 11, $t_{0}$ generate no $y$ -token. By separation, neither does $t$ which contradicts $t\in\mathsf{CSD}$ . Hence $t_{0}$ contains all nodes of $C_{4}$ . Then by lemma 11 again, $n^{C_{2}[C_{4}]}_{\overline{x}}=n^{C_{4}}_{\overline{x}}$ , hence $n^{t_{0}}_{\overline{x}}=n^{t^{\prime}_{0}}_{\overline{x}}+n^{C_{2}[C_{4}]}_{\overline{x}}$ and $e^{t_{0}}_{\overline{x}}\leq e^{t^{\prime}_{0}}_{\overline{x}}+n^{C_{2}[C_{4}]}_{\overline{x}}$ which yields

[TABLE]

Case 2 In this case Lemma 12 applies and at least one of $C_{2}$ - $C_{4}$ generate some token $z\in\{a,b,c,d\}$ . The only subcase not already dealt with in Case 0 is the one where we can set $z=x$ . We thus get inductively:

[TABLE]

Since $C_{2}$ or $C_{4}$ generate at least some $x$ -token, we have $n^{t}_{x}\geq n^{t^{\prime}}_{x}+1$ . Moreover $e^{t_{0}}_{\overline{x}}\leq e^{t^{\prime}_{0}}_{\overline{x}}+l_{0}$ since $C_{2}[C_{4}]$ generate at most $l_{0}$ $\overline{x}$ -edges, and $n^{t_{0}}_{\overline{x}}\geq n^{t^{\prime}_{0}}_{\overline{x}}$ . So we have $e^{t_{0}}_{\overline{x}}\leq n^{t^{\prime}_{0}}_{\overline{x}}+n^{t^{\prime}}_{x}l_{0}+l_{0}\leq n^{t_{0}}_{\overline{x}}+n^{t}_{x}l_{0}$ concluding the proof.

*Lemma 7-2**.*

If $t\in{\mathbf{T}}$ is $x,y,l_{0}$ -separated then $t$ there exists a minimal $x,y,l_{0}$ -separation $D_{x}[D_{0}[t_{y}]]$ of $t$ is such that, letting $t_{0}=D_{0}[t_{y}]$ , we have

[TABLE]

*Proof 14**.*

We prove the result by induction over the height of $t$ .

$t$ * is $x,y,l_{0}$ -separated so let us consider $D_{x}[D_{0}[t_{y}]]$ a minimal $x,y,l_{0}$ -separation of $t$ . Let $t_{0}=D_{0}[t_{y}]$ .*

Base Case Assume $\operatorname{\mathsf{ht}}(t)\leq p$ . Then. $n^{t}_{\overline{y}}\leq l_{0}$ . Since $\operatorname{\mathsf{yd}}(t)\in\mathsf{CSD}_{s}$ , $n^{t}_{x}>0$ . Moreover, $0\leq e^{t_{0}}_{\overline{y}}$ and $n^{t}_{\overline{y}}\leq l_{0}$ . So $n^{t}_{\overline{y}}-n^{t}_{x}l_{0}-r^{t}_{\overline{y}}\leq 0\leq e^{t_{0}}_{\bar{y}}$ which ensures the bound.

Induction step If $h(t)>p$ , we apply Lemma 1 to $t$ to yield a decomposition $t=C_{1}[C_{2}[C_{3}[C_{4}[t_{5}]]]]$ , where $t^{\prime}=C_{1}[C_{3}[t_{5}]]\in{\mathbf{T}}$ , $\operatorname{\mathsf{ht}}(t^{\prime})<\operatorname{\mathsf{ht}}(t)$ and $\operatorname{\mathsf{ht}}(C_{2}[C_{3}[C_{4}]])\leq p$ . By Lemma 13, $t^{\prime}$ is $x,y,l_{0}$ -separated. We let $t^{\prime}=C_{x}[C_{0}[t^{\prime}_{y}]]$ be a minimal separation of $t^{\prime}$ verifying the bound and $t_{0}^{\prime}=C_{0}[t^{\prime}_{y}]$ . In other words, we have:

[TABLE]

By Lemma 14, $t_{0}=D_{0}[t_{y}]$ contains all nodes of $t^{\prime}_{0}$ . We distinguish cases according to Lemmas 11 and 12.

Case 1 Consider first the case where Lemma 11 applies i.e. $C_{2}$ and $C_{4}$ generate only one kind of bar token, $\overline{z}$ , and brackets. We now distinguish cases depending on the value of $\overline{z}$ . Before this, we emphasize that in all subcases it holds that $n^{t}_{x}=n^{t^{\prime}}_{x}$ .

subcase i) $\overline{z}\neq\overline{y}$ . Since all nodes of $t^{\prime}_{0}$ are contained in $t_{0}$ , we have $e^{t_{0}}_{\overline{x}}\geq e^{t^{\prime}_{0}}_{\overline{x}}$ . Since $C_{2}$ and $C_{4}$ generate no $\overline{y}$ -token, we have $n^{t}_{\overline{y}}=n^{t^{\prime}}_{\overline{y}}$ and $r^{t}_{\overline{y}}=r^{t^{\prime}}_{\overline{y}}$ . Injecting into inequation (3) concludes.

subcase ii) $\overline{z}=\overline{y}$ . We distinguish the different possible overlap of $C_{2}$ and $C_{4}$ with $t_{0}$ . Notice first that, by minimality, if any $C_{i}$ , $i\in\{2,4\}$ overlaps with $t_{0}$ then $t_{0}$ contains all nodes of $C_{i}$ , for otherwise we would have $D_{0}=D_{0}^{\prime}[D_{0}^{\prime\prime}]$ with $D_{0}^{\prime}$ a subcontext of $C_{i}$ such that $\operatorname{\mathsf{ht}}(D_{0}^{\prime}>0)$ , and in that case $(D_{x}[D_{0}^{\prime}],D_{0}^{\prime\prime},t_{y})$ would be a smaller $x,y,l$ -separation of $t$ since $C_{i}$ (hence $D_{0}^{\prime}$ ) does not generate $y$ -tokens.

Hence, in the case where $t_{0}$ overlaps with $C_{2}$ , $t_{0}$ contains all nodes of $C_{2}$ and $C_{4}$ . Since $t_{0}$ also contains all nodes of $t^{\prime}_{0}$ , $e^{t_{0}}\geq e^{t^{\prime}_{0}}+e^{C_{2}[C_{4}]}=e^{t^{\prime}_{0}}+n^{t}_{\overline{y}}-n^{t^{\prime}}_{\overline{y}}$ . Moreover, $r^{t}_{\overline{y}}\geq r^{t^{\prime}}_{\overline{y}}$ . We can then conclude using inequation 3.

Consider now the case where $t_{0}$ does not overlap with $C_{2}$ or $C_{4}$ . Since all $y$ -tokens are generated by $t_{0}$ , projectivity of the yield and the definition of $\mathsf{CSD}$ impose that $r^{t}_{\overline{y}}=r^{t^{\prime}}_{\overline{y}}+n^{C_{2}[C_{4}]}_{\overline{y}}$ . We further have $e^{t_{0}}_{\overline{y}}\geq e^{t^{\prime}_{0}}_{\overline{y}}$ , and injecting into inequation 3 yields $e^{t_{0}}_{\overline{y}}\geq n^{t}_{\overline{y}}-n^{C_{2}[C_{4}]}_{\overline{y}}+n^{t}_{x}l_{0}-r^{t_{0}}_{\overline{y}}+n^{[C_{2}[C_{4}]}_{\overline{y}}$ which simplifies into the desired $y$ bound.

Finally, in the case where $C_{2}$ does not overlap with $t_{0}$ but $C_{4}$ does, all nodes of $C_{2}$ are contained in $D_{x}$ and all nodes of $C_{4}$ are contained in $t_{0}$ . We must then have $|\operatorname{\mathsf{yd}}(C_{3})|_{y}>0$ . Otherwise, there would exist an $x,y,l$ -separation $E_{x}[E_{0}[t_{y}]]$ with $E_{x}=C_{1}[C_{2}[C_{3}[C_{4}]]]$ , and $\operatorname{\mathsf{ht}}(E_{0})<\operatorname{\mathsf{ht}}(D_{0})$ . Assume $|\operatorname{\mathsf{yd}}(C_{3})|_{x}>0$ . Lemma 11, point 2, ensures that $|\operatorname{\mathsf{left}}(C_{3})|_{x,y}=0$ or $|\operatorname{\mathsf{right}}{(C_{3})}|_{x,y}=0$ . Assume $|\operatorname{\mathsf{right}}(C_{3})|_{x,y}=0$ (the other case is symmetric). We then have both a $x$ and a $y$ generated on the left of $C_{3}$ . Since neither $C_{1}[C_{2}]$ nor $t_{5}$ generate any $y$ -token, projectivity imposes $r^{t}_{\overline{y}}=r^{t^{\prime}}_{\overline{y}}+n^{C_{2}[C_{4}]}_{\overline{y}}$ and we can conclude as in the previous case. The only remaining subcase is when $|\operatorname{\mathsf{yd}}(C_{3})|_{x}=0$ , in which case $t$ is [math]-separated, and considering the (minimal) [math]-separation $(C_{1},X,C_{2}[C_{3}[C_{4}]])$ we can use the same argument as in the case where $t_{0}$ encompasses all nodes of $C_{2}$ and $C_{4}$ .

Case 2 Consider now the remaining case where Lemma 12 applies. If neither $C_{2}$ or $C_{4}$ generate some $x$ or $y$ -token, they don’t generate $\overline{x}$ or $\overline{y}$ -tokens either, and the same reasoning as Case 1 subcase i) applies. Otherwise $C_{2}[C_{4}]$ generate at least some $x$ -token. We then have $n^{t}_{x}\geq n^{t^{\prime}}_{x}+1$ . Since $t_{0}$ contains all nodes from $t^{\prime}_{0}$ we further have $e^{t_{0}}_{\overline{y}}\geq e^{t^{\prime}_{0}}_{\overline{y}}$ . Finally $n^{t}_{\overline{y}}\leq n^{t^{\prime}}_{\overline{y}}+l_{0}$ . We conclude using inequation 3.

B.6 Conclusion

*Lemma 8**.*

For any $t\in{\mathbf{T}}$ , if $t$ is $x,y,l_{0}$ -separated then $t$ is $x,y,l_{0}$ -asynchronous.

*Proof 15**.*

By Lemma 7-2, there is a minimal $x,y,l_{0}$ -separation $t=D_{x}[D_{0}[t_{y}]]$ such that the $y$ bound obtains for $t_{0}=D_{0}[t_{y}]$ . By lemma 7-1 the $x$ bound obtains for $t_{0}$ as well. Observe finally, that $r^{t}_{\overline{y}}\leq m^{t}_{\overline{y}}$ and since $t_{0}$ generates at most $l_{0}$ $x$ -tokens, by projectivity and definition of $\mathsf{CSD}$ , it generates at most $(l_{0}+1)m^{t}_{\overline{x}}$ $\overline{x}$ -tokens (one sequence of $m^{t}_{\overline{x}}$ between each occurrence of $x$ and the next, plus possibly one in front of the first and one after the last). Hence,

[TABLE]

and $t$ is $x,y,l_{0}$ -asynchronous.

*Lemma 9**.*

For any $t\in{\mathbf{T}}$ , $t$ is $x,y,l_{0}$ -separated for some $x/y\in\mathsf{Sep}$ .

*Proof 16**.*

The proof proceeds by induction on the height of $t$ .

If $\operatorname{\mathsf{ht}}(t)\leq p$ . Then $|\operatorname{\mathsf{yd}}(t)|_{z}\leq l_{0}$ for any $z\in\{a,b,c,d\}$ , hence $t$ is trivially $x,y,l_{0}$ -separated for some $x/y\in\mathsf{Sep}$ .

If $h(t)>p$ , Lemma 1 yields a decomposition $t=C_{1}[C_{2}[C_{3}[C_{4}[t_{5}]]]]$ , where $t^{\prime}=C_{1}[C_{3}[t_{5}]]\in{\mathbf{T}}$ , $\operatorname{\mathsf{ht}}(t^{\prime})<\operatorname{\mathsf{ht}}(t)$ and $\operatorname{\mathsf{ht}}(C_{2}[C_{4}])\leq p$ . By induction $t^{\prime}$ is $x,y,l_{0}$ -separated for some $x/y\in\mathsf{Sep}$ . For sake of succintness, let us present the inductive step for $x/y=a/c$ , the reasoning for other cases is analoguous. Let us examine the different possible configurations of $C_{2}$ and $C_{4}$ .

Case 1 If Lemma 11 applies i.e. $C_{2}$ and $C_{4}$ generate only one kind of bar token, $\overline{z}$ , and brackets, one checks easily that inserting $C_{2}$ and $C_{4}$ does not change the distribution of $a$ and $c$ -tokens in the tree, hence $t$ is $a,c,l_{0}$ -separated.

Case 2 If Lemma 12 applies, note first that if $C_{2}$ and $C_{4}$ generate no $a$ or $c$ -token, we can conclude as in Case 1 as the distribution of $a$ and $c$ -tokens in the tree is not changed either. Otherwise, we assume that $C_{2}$ or $C_{4}$ generate some $a$ or $c$ -token and distinguish between subcases 1-3 of Lemma 12:

Subcase 1 in this case for some $i\in\{2,4\}$ $\operatorname{\mathsf{left}}(C_{i})$ contains an $a$ -token and no $b,c$ or $d$ -token while $\operatorname{\mathsf{right}}(C_{i})$ contains some $c$ -token and no $a,b$ or $d$ -token. Assume $i=2$ , the case where $i=4$ is similar. By projectivity and definition of $\mathsf{CSD}_{s}$ follows that all $b$ -tokens are generated in $C_{3}[C_{4}[t_{5}]]$ and all $c$ -tokens in $C_{1}$ . $t$ is therefore $b,d,0$ -separated, hence $b,d,l_{0}$ -separated.

Subcase 2 in this case, $\operatorname{\mathsf{left}}(C_{2})$ contains some $a$ -token and no $b,c,d$ -token, $\operatorname{\mathsf{right}}(C_{3})$ contains some $d$ -token and no $a,c,d$ -token, $\operatorname{\mathsf{left}}(C_{4})$ contains some $b$ -token and no $a,c,d$ -token, $\operatorname{\mathsf{right}}(C_{4})$ contains some $c$ -token and no $a,b,d$ -token. It follows that $t_{5}$ generate no occurrence of $a$ and $C_{1}$ no occurrence of $c$ . Since $|\operatorname{\mathsf{yd}}(C_{2}[C_{3}[C_{4}]])|_{a}\leq l_{0}$ , $(C_{1},C_{2}[C_{3}[C_{4}]],t_{5})$ is an $a,c,l_{0}$ -separation.

Subcase 3 Assume $\operatorname{\mathsf{left}}(C_{2})$ contains some $a$ -token and no $b,c,d$ -token and that $\operatorname{\mathsf{left}}(C_{4})$ contains some $c$ -token. It follows that all $b$ -tokens are generated by $C_{3}$ . So $\operatorname{\mathsf{yd}}(t)$ contains less than $l_{0}$ $b$ -tokens, by definition of $\mathsf{CSD}$ it also contains less than $l_{0}$ $d$ -tokens, so $(C_{1},C_{2}[C_{3}[C_{4}]],t_{5})$ is a $d,b,l_{0}$ -separation.

Assume now $\operatorname{\mathsf{left}}(C_{2})$ contains some $a$ -token and no $b,c,d$ -token and that $\operatorname{\mathsf{right}}(C_{4})$ contains some $c$ -token. It follows that $t_{5}$ generate no $d$ -token and $C_{1}$ generate no $b$ -token. Hence $(C_{1},C_{2}[C_{3}[C_{4}]],t_{5})$ is $b,d,l_{0}$ -separation.

The remaining cases are symmetric exchanging $c$ with $a$ , $d$ with $b$ , and ${\operatorname{\mathsf{left}}}$ with ${\operatorname{\mathsf{right}}}$ everywhere.

Bibliography32

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Andor et al. (2016) Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro Presta, Kuzman Ganchev, Slav Petrov, and Michael Collins. 2016. Globally normalized transition-based neural networks. In Proceedings of ACL .
2Artzi et al. (2015) Yoav Artzi, Kenton Lee, and Luke Zettlemoyer. 2015. Broad-coverage CCG Semantic Parsing with AMR. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing .
3Baldridge and Kruijff (2002) Jason Baldridge and Geert-Jan M. Kruijff. 2002. Coupling CCG and Hybrid Logic Dependency Semantics. In Proceedings of the 40th ACL .
4Banarescu et al. (2013) Laura Banarescu, Claire Bonial, Shu Cai, Madalina Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin Knight, Philipp Koehn, Martha Palmer, and Nathan Schneider. 2013. Abstract Meaning Representation for Sembanking. In Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse .
5Bender (2008) Emily M. Bender. 2008. Radical non-configurationality without shuffle operators: An analysis of Wambaya. In Proceedings of the 15th International Conference on HPSG .
6Bender et al. (2002) Emily M. Bender, Dan Flickinger, and Stephan Oepen. 2002. The Grammar Matrix: An open-source starter-kit for the rapid development of cross-linguistically consistent broad-coverage precision grammars. In Proceedings of the COLING Workshop on Grammar Engineering and Evaluation .
7Blackburn and Bos (2005) Patrick Blackburn and Johan Bos. 2005. Representation and Inference for Natural Language . CSLI Publications.
8Bos (2008) Johan Bos. 2008. Wide-coverage semantic analysis with Boxer. In Semantics in Text Processing. STEP 2008 Conference Proceedings . College Publications.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Semantic expressive capacity with bounded memory

Abstract

1 Introduction

2 Compositional semantic construction

2.1 Compositional mechanisms

2.2 Boundedness and projectivity

3 Formal background

3.1 Grammars for strings and trees

3.2 Context-free tree languages

Lemma 1* (Maibaum (1978)).*

3.3 The HR algebra

Lemma 2*.*

3.4 Grammars with semantic interpretations

4 Projective cross-serial dependencies

4.1 The relation CSD\mathsf{CSD}CSD

4.2 CSD\mathsf{CSD}CSD with bounded blocks

4.3 kkk-distant trees

Lemma 3*.*

Proof 1*.*

Lemma 4*.*

4.4 Projectivity and alignments

Theorem 1*.*

Proof 2*.*

5 Expressive capacity of LM-CFTG

Theorem 2*.*

5.1 Asynchronous derivations

Definition 1* (x,y,lx,y,lx,y,l-asynchronous derivation).*

Lemma 5*.*

Proof 3*.*

5.2 LM-CFTG derivations are asynchronous

Lemma 6*.*

Definition 2* (x,y,lx,y,lx,y,l-separated derivation).*

Lemma 7*.*

Proof 4* (sketch).*

Lemma 8*.*

Proof 5*.*

Lemma 9*.*

Proof 6* (sketch).*

6 Conclusion

Acknowledgments

Appendix A Details of the proof of Theorem 1

Lemma 4*.*

Proof 7*.*

Appendix B Details of the proof of Theorem 2

B.1 Terminology

B.2 Pumping considerations

Lemma 10*.*

Proof 8*.*

Lemma 11*.*

Proof 9*.*

Lemma 12*.*

Proof 10*.*

B.3 Separation

Lemma 13*.*

Proof 11*.*

B.4 Minimality argument

Lemma 14*.*

Proof 12*.*

B.5 Inductive bounds

Lemma 7-1*.*

Proof 13*.*

Lemma 7-2*.*

Proof 14*.*

B.6 Conclusion

Lemma 8*.*

Proof 15*.*

Lemma 9*.*

Proof 16*.*

*Lemma 1** (Maibaum (1978)).*

*Lemma 2**.*

4.1 The relation $\mathsf{CSD}$

4.2 $\mathsf{CSD}$ with bounded blocks

4.3 $k$ -distant trees

*Lemma 3**.*

*Proof 1**.*

*Lemma 4**.*

*Theorem 1**.*

*Proof 2**.*

*Theorem 2**.*

*Definition 1** ( $x,y,l$ -asynchronous derivation).*

*Lemma 5**.*

*Proof 3**.*

*Lemma 6**.*

*Definition 2** ( $x,y,l$ -separated derivation).*

*Lemma 7**.*

*Proof 4** (sketch).*

*Lemma 8**.*

*Proof 5**.*

*Lemma 9**.*

*Proof 6** (sketch).*

*Lemma 4**.*

*Proof 7**.*

*Lemma 10**.*

*Proof 8**.*

*Lemma 11**.*

*Proof 9**.*

*Lemma 12**.*

*Proof 10**.*

*Lemma 13**.*

*Proof 11**.*

*Lemma 14**.*

*Proof 12**.*

*Lemma 7-1**.*

*Proof 13**.*

*Lemma 7-2**.*

*Proof 14**.*

*Lemma 8**.*

*Proof 15**.*

*Lemma 9**.*

*Proof 16**.*