Semantic expressive capacity with bounded memory
Antoine Venant, Alexander Koller

TL;DR
This paper explores the limits of semantic parsing mechanisms, showing that projective systems require unbounded memory to represent certain relations, unlike nonprojective systems, impacting both grammar-based and neural models.
Contribution
It provides the first proof demonstrating the memory requirements for projective semantic parsing mechanisms, highlighting fundamental differences from nonprojective systems.
Findings
Projective mechanisms need unbounded memory for certain relations
Nonprojective mechanisms can represent these relations without unbounded memory
Implications for the design of grammar-based and neural semantic parsers
Abstract
We investigate the capacity of mechanisms for compositional semantic parsing to describe relations between sentences and semantic representations. We prove that in order to represent certain relations, mechanisms which are syntactically projective must be able to remember an unbounded number of locations in the semantic representations, where nonprojective mechanisms need not. This is the first result of this kind, and has consequences both for grammar-based and for neural systems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\noautomath
Semantic expressive capacity with bounded memory
Antoine Venant
Alexander Koller
Department of Language Science and Technology
Saarland University
{venant|koller}@coli.uni-saarland.de
Abstract
We investigate the capacity of mechanisms for compositional semantic parsing to describe relations between sentences and semantic representations. We prove that in order to represent certain relations, mechanisms which are syntactically projective must be able to remember an unbounded number of locations in the semantic representations, where nonprojective mechanisms need not. This is the first result of this kind, and has consequences both for grammar-based and for neural systems.
1 Introduction
Semantic parsers which translate a sentence into a semantic representation compositionally must recursively compute a partial semantic representation for each node of a syntax tree. These partial semantic representations usually contain placeholders at which arguments and modifiers are attached in later composition steps. Approaches to semantic parsing differ in whether they assume that the number of placeholders is bounded or not. Lambda calculus Montague (1974); Blackburn and Bos (2005) assumes that the number of placeholders (lambda-bound variables) can grow unboundedly with the length and complexity of the sentence. By contrast, many methods which are based on unification Copestake et al. (2001) or graph merging Courcelle and Engelfriet (2012); Chiang et al. (2013) assume a fixed set of placeholders, i.e. the number of placeholders is bounded.
Methods based on bounded placeholders are popular both in the design of hand-written grammars Bender et al. (2002) and in semantic parsing for graphs Peng et al. (2015); Groschwitz et al. (2018). However, it is not clear that all relations between language and semantic representations can be expressed with a bounded number of placeholders. The situation is particularly challenging when one insists that the compositional analysis is projective in the sense that each composition step must combine adjacent substrings of the input sentence. In this case, it may be impossible to combine a semantic predicate with a distant argument immediately, forcing the composition mechanism to use up a placeholder to remember the argument position. If many predicates have distant arguments, this may exceed the bounded “memory capacity” of the compositional mechanism.
In this paper, we show that there are relations between sentences and semantic representations which can be described by compositional mechanisms which are bounded and non-projective, but not by ones which are bounded and projective. To our knowledge, this is the first result on expressive capacity with respect to semantics – in contrast to the extensive literature on the expressive capacity of mechanisms which describe just the string languages.
More precisely, we prove that tree-adjoining grammars can describe string-graph relations using the HR graph algebra Courcelle and Engelfriet (2012) with two sources (bounded, non-projective) which cannot be described using linear monadic context-free tree grammars and the HR algebra with sources, for any fixed (bounded, projective). This result is especially surprising because TAG and linear monadic CFTGs describe the same string languages; thus the difference lies only in the projectivity of the syntactic analysis.
We further prove that given certain assumptions on the alignment between tokens in the sentence and edges in the graph, no generative device for projective syntax trees can simulate TAG with two sources. This has practical consequences for the design of transition-based semantic parsers (whether grammar-based or neural).
Plan of the paper. We will first explain the linguistic background in Section 2 and lay the formal foundations in Section 3. We will then prove the reduced semantic expressive capacity for aligned generative devices in Section 4 and for CFTGs in Section 5. We conclude with a discussion of the practical impact of our findings (Section 6).
2 Compositional semantic construction
The Principle of Compositionality, which is widely accepted in theoretical semantics, states that the meaning of a natural-language expression can be determined from the meanings of its immediate subexpressions and the way in which the subexpressions were combined. Implementations of this principle usually assume that there is some sort of syntax tree which describes the grammatical structure of a sentence. A semantic representation is then calculated by bottom-up evaluation of this syntax tree, starting with semantic representations of the individual words and then recursively computing a semantic representation for each node from those of its children.
2.1 Compositional mechanisms
Mechanisms for semantic composition will usually keep track of places at which semantic arguments are still missing or modifiers can still be attached. For instance, when combining the semantic representations for “John” and “sleeps” in a derivation of “John sleeps”, the “subject” argument of “sleeps” is filled with the meaning of “John”. The compositional mechanism therefore assigns a semantic representation to “sleeps” which has an unfilled placeholder for the subject.
The exact nature of the placeholder depends on the compositional mechanism. There are two major classes in the literature. Lambda-style compositional mechanisms use a list of placeholders. For instance, lambda calculus, as used e.g. in Montague Grammar Montague (1974), CCG Steedman (2001), or linear-logic-based approaches in LFG Dalrymple et al. (1995) might represent “sleeps” as . Placeholders are lambda-bound variables (here: ).
By contrast, unification-style compositional mechanisms use names for placeholders. For example, a simplified form of the Semantic Algebra used in HPSG Copestake et al. (2001) might represent “sleeps” as the feature structure . This is unified with . The placeholders are holes with labels from a fixed set of argument names (e.g. ). Named placeholders are also used in the HR algebra Courcelle and Engelfriet (2012) and its derivatives, like Hyperedge Replacement Grammars Drewes et al. (1997); Chiang et al. (2013) and the AM algebra Groschwitz et al. (2018).
2.2 Boundedness and projectivity
A fundamental difference between lambda-style and unification-style compositional mechanisms is in their “memory capacity”: the number of placeholders in a lambda-style mechanism can grow unboundedly with the length and complexity of the sentence (e.g. by functional composition of lambda terms), whereas in a unification-style mechanism, the placeholders are fixed in advance.
There is an informal intuition that unbounded memory is needed especially when an unbounded number of semantic predicates can be far away from their arguments in the sentence, and the syntax formalism does not allow these predicates to combine immediately with the arguments. For illustration, consider the two derivations of the following Swiss German sentence from Shieber (1985) in Fig. 1:
{exe}\ex\gll
(dass) (mer) d’ chind em Hans es huus lönd hälfed aastriiche
(that) (we) the-children-ACC Hans-DAT the-house-ACC let help paint
\glt‘(that we) let the children help Hans paint the house’
The lexical semantic representation of each verb comes with a placeholder for its object () and, in the case of “lönd” and “hälfed”, also one for its verb complement (). The derivation in Fig. 1a immediately combines each verb with its complements; the placeholders that are used at each node never grow beyond the ones the verbs originally had. However, this derivation combines verbs with nouns which are not adjacent in the string, which is not allowed in many grammar formalisms. If we limit ourselves to combining only adjacent substrings (projectively, see Fig. 1b), we must remember the placeholders for all the verbs at the same time if we want to obtain the correct predicate-argument structure. Thus, the number of placeholders grows with the length of the sentence; this is only possible with a lambda-style compositional mechanism.
There is scattered evidence in the literature for this tension between bounded memory and projectivity. Chiang et al. (2013) report (of a compositional mechanism based on the HR algebra, unification-style) that a bounded number of placeholders suffices to derive the graphs in the AMR version of the Geoquery corpus, but Groschwitz et al. (2018) find that this requires non-projective derivations in 37% of the AMRBank training data Banarescu et al. (2013). Approaches to semantic construction with tree-adjoining grammar either perform semantic composition along the TAG derivation tree using unification (non-projective, unification-style) Gardent and Kallmeyer (2003) or along the TAG derived tree using linear logic (projective, lambda-style) Frank and van Genabith (2001). Bender (2008) discusses the challenges involved in modeling the predicate-argument structure of a language with very free word order (Wambaya) with projective syntax. While the Wambaya noun phrase does not seem to require the projective grammar to collect unbounded numbers of unfilled arguments as in Fig. 1b, Bender notes that her projective analysis still requires a more flexible handling of semantic arguments than the HPSG Semantic Algebra (unification-style) supports.
In this paper, we define a notion of semantic expressive capacity and prove the first formal results about the relationship between projectivity and bounded memory.
3 Formal background
Let be the nonnegative integers. A signature is a finite set of function symbols , each of which has been assigned a nonnegative integer called its rank. We write for the symbols of rank . Given a signature , we say that all constants are trees over ; further, if and are trees over , then is also a tree. We write for the set of all trees over . We define the height of a tree to be , and for .
Let , and let (with as a constant of rank 0). Then we call a tree a context if it contains exactly one occurrence of , and write for the set of all contexts. A context can be seen as a tree with exactly one hole. If , we write for the tree in that is obtained by replacing with .
Given a string , we write for the number of times that occurs in .
3.1 Grammars for strings and trees
We take a very general view on how semantic representations for strings are constructed compositionally. To this end, we define a notion of “grammar” which encompasses more devices for describing languages than just traditional grammars, such as transition-based parsers.
We say that a tree grammar over the signature is any finite device that defines a language . For instance, regular tree grammars Comon et al. (2007) are tree grammars, and context-free grammars can also be seen as tree grammars defining the language of parse trees.
We say that a string grammar over the signature and the alphabet is a pair consisting of a tree grammar over and a yield function which maps trees to strings over Weir (1988). A string grammar defines a language . We call the trees derivations.
A particularly common yield function is the function , defined as if and if has rank 0. This yield function simply concatenates the words at the leaves of . Applied to the phrase-structure tree in Fig. 2c, is the Swiss German sentence in (2.2). Context-free grammars can be characterized as string grammars that combine a regular tree grammar with . By contrast, we can model tree-adjoining grammars (TAG, Joshi and Schabes, 1997) by choosing a tree grammar that describes derivation trees as in Fig. 2b. The function could then substitute and adjoin the elementary trees as specified by the derivation tree (see Fig. 2a) and then read off the words from the resulting derived tree in Fig. 2c.
We say that a string grammar is projective if its yield function is . Context-free grammars as construed above are clearly projective. Tree-adjoining grammars are not projective: For instance, the yield of the subtree below “aastriiche” in Fig. 2b consists of the two separate strings “es Huus” and “aastriiche”, which are then wrapped around “lönd hälfed” further up in the derivation.
If the grammar is projective, then for any context there exist two strings and such that for any tree , .
3.2 Context-free tree languages
Below, we will talk about linear monadic context-free tree grammars (LM-CFTGs; Rounds (1969), Comon et al. (2007)). An LM-CFTG is a quadruple , where is a ranked signature of nonterminals of rank at most one, is a ranked signature of terminals, is the start symbol, and is a finite set of production rules of one of the forms
- •
with and
- •
with and ,
where . The trees in are obtained by expanding with production rules. Nonterminals of rank zero are expanded by replacing them with trees. Nonterminals of rank one must have exactly one child in the tree; they are replaced by a context, and the variable in the context is replaced by the subtree below the child.
We can extend an LM-CFTG to a string grammar . Then LM-CFTG is weakly equivalent to TAG Kepser and Rogers (2011); that is, LM-CFTG and TAG generate the same class of string languages. Intuitively, the weakly equivalent LM-CFTG directly describes the language of derived trees of the TAG grammar (cf. Fig. 2c). Notice that LM-CFTG is projective.
Below, we will make crucial use of the following pumping lemma for LM-CFTLs:
Lemma 1* (Maibaum (1978)).*
Let be an LM-CFTG. There exists a constant such that for any with , there exists a decomposition with and such that for any , , where we let and
We call the pumping height of .
3.3 The HR algebra
The specific unification-style semantic algebra we use in this paper is the HR algebra Courcelle and Engelfriet (2012). This choice encompasses much of the recent literature on compositional semantic parsing with graphs, based e.g. on Hyperedge Replacement Grammars Chiang et al. (2013); Peng et al. (2015); Koller (2015) and the AM algebra Groschwitz et al. (2018).
The values of the HR algebra are s-graphs: directed, edge-labeled graphs, some of whose nodes may be designated as sources, written in angle brackets. S-graphs can be combined using the forget, rename, and merge operations. Rename changes an -source node into a -source node. Forget makes it so the -source node in the s-graph is no longer a source node. Merge combines two s-graphs while unifying nodes with the same source annotation. For instance, the s-graphs \langle\mbox{\mathsf{rt}}\rangle\xrightarrow{\mbox{ARG1}}\langle\mbox{\mathsf{o}}\rangle and \langle\mbox{\mathsf{o}}\rangle\leavevmode\hbox to22pt{\vbox to14.18pt{\pgfpicture\makeatletter\hbox{\hskip 1.97176pt\lower-4.30202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor[named]{pgfstrokecolor}{rgb}{1,1,1}\pgfsys@color@gray@stroke{1}\pgfsys@invoke{ }\definecolor[named]{pgffillcolor}{rgb}{1,1,1}\pgfsys@color@gray@fill{1}\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor[named]{pgfstrokecolor}{rgb}{1,1,1}\pgfsys@color@gray@stroke{1}\pgfsys@invoke{ }\definecolor[named]{pgffillcolor}{rgb}{1,1,1}\pgfsys@color@gray@fill{1}\pgfsys@invoke{ }{}\pgfsys@moveto{1.42264pt}{0.0pt}\pgfsys@curveto{1.42264pt}{0.7857pt}{0.7857pt}{1.42264pt}{0.0pt}{1.42264pt}\pgfsys@curveto{-0.7857pt}{1.42264pt}{-1.42264pt}{0.7857pt}{-1.42264pt}{0.0pt}\pgfsys@curveto{-1.42264pt}{-0.7857pt}{-0.7857pt}{-1.42264pt}{0.0pt}{-1.42264pt}\pgfsys@curveto{0.7857pt}{-1.42264pt}{1.42264pt}{-0.7857pt}{1.42264pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@fillstroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{} {{}{}{{}}{}}{{}{}{{}}{}}{{}{}}{{}} {{}{}{{}}{}}{{{}}{{}}}{{}}{{}{}{{}}{}}{{{}}{{}}}{{}}{}{{}}{{{{{}}{}{}{}{}{{}}}}}{{}}{}{{{{{{}}{}{}{}{}{{}}}}}{}{}{}{}}{}{}{}{}{{}}{{}}{{}}{}{{ {\pgfsys@beginscope\pgfsys@setlinewidth{0.32pt}\pgfsys@setdash{}{0.0pt}\pgfsys@roundcap\pgfsys@roundjoin{} {}{}{} {}{}{} \pgfsys@moveto{-1.19998pt}{1.59998pt}\pgfsys@curveto{-1.09998pt}{0.99998pt}{0.0pt}{0.09999pt}{0.29999pt}{0.0pt}\pgfsys@curveto{0.0pt}{-0.09999pt}{-1.09998pt}{-0.99998pt}{-1.19998pt}{-1.59998pt}\pgfsys@stroke\pgfsys@endscope}} }{}{}{{}}\pgfsys@moveto{1.56734pt}{0.41997pt}\pgfsys@curveto{15.30898pt}{4.10202pt}{15.30898pt}{-4.10202pt}{2.97757pt}{-0.79784pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{-0.96593}{0.25882}{-0.25882}{-0.96593}{2.97757pt}{-0.79784pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-1.97176pt}{3.04527pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{Hans}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{-2.15277pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{ {}{}{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}} are merged into \langle\mbox{\mathsf{rt}}\rangle\xrightarrow{\mbox{ARG0}}\langle\mbox{\mathsf{o}}\rangle\leavevmode\hbox to22pt{\vbox to14.18pt{\pgfpicture\makeatletter\hbox{\hskip 1.97176pt\lower-4.30202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor[named]{pgfstrokecolor}{rgb}{1,1,1}\pgfsys@color@gray@stroke{1}\pgfsys@invoke{ }\definecolor[named]{pgffillcolor}{rgb}{1,1,1}\pgfsys@color@gray@fill{1}\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor[named]{pgfstrokecolor}{rgb}{1,1,1}\pgfsys@color@gray@stroke{1}\pgfsys@invoke{ }\definecolor[named]{pgffillcolor}{rgb}{1,1,1}\pgfsys@color@gray@fill{1}\pgfsys@invoke{ }{}\pgfsys@moveto{1.42264pt}{0.0pt}\pgfsys@curveto{1.42264pt}{0.7857pt}{0.7857pt}{1.42264pt}{0.0pt}{1.42264pt}\pgfsys@curveto{-0.7857pt}{1.42264pt}{-1.42264pt}{0.7857pt}{-1.42264pt}{0.0pt}\pgfsys@curveto{-1.42264pt}{-0.7857pt}{-0.7857pt}{-1.42264pt}{0.0pt}{-1.42264pt}\pgfsys@curveto{0.7857pt}{-1.42264pt}{1.42264pt}{-0.7857pt}{1.42264pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@fillstroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{} {{}{}{{}}{}}{{}{}{{}}{}}{{}{}}{{}} {{}{}{{}}{}}{{{}}{{}}}{{}}{{}{}{{}}{}}{{{}}{{}}}{{}}{}{{}}{{{{{}}{}{}{}{}{{}}}}}{{}}{}{{{{{{}}{}{}{}{}{{}}}}}{}{}{}{}}{}{}{}{}{{}}{{}}{{}}{}{}{}{}{{}}\pgfsys@moveto{1.56734pt}{0.41997pt}\pgfsys@curveto{15.30898pt}{4.10202pt}{15.30898pt}{-4.10202pt}{2.97757pt}{-0.79784pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{-0.96593}{0.25882}{-0.25882}{-0.96593}{2.97757pt}{-0.79784pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-1.97176pt}{3.04527pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{Hans}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{-2.15277pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{ {}{}{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}.
The HR algebra uses operation symbols from a ranked signature to describe s-graphs syntactically. contains symbols for merge (rank 2) and the forget and rename operations (rank 1). It also contains constants (symbols of rank 0) which denote s-graphs of the form \langle\mbox{\mathsf{a}}\rangle\xrightarrow{\mbox{f}}\langle\mbox{\mathsf{b}}\rangle and \langle\mbox{\mathsf{a}}\rangle\leavevmode\hbox to17.13pt{\vbox to14.29pt{\pgfpicture\makeatletter\hbox{\hskip 1.62263pt\lower-4.30202pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor[named]{pgfstrokecolor}{rgb}{1,1,1}\pgfsys@color@gray@stroke{1}\pgfsys@invoke{ }\definecolor[named]{pgffillcolor}{rgb}{1,1,1}\pgfsys@color@gray@fill{1}\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor[named]{pgfstrokecolor}{rgb}{1,1,1}\pgfsys@color@gray@stroke{1}\pgfsys@invoke{ }\definecolor[named]{pgffillcolor}{rgb}{1,1,1}\pgfsys@color@gray@fill{1}\pgfsys@invoke{ }{}\pgfsys@moveto{1.42264pt}{0.0pt}\pgfsys@curveto{1.42264pt}{0.7857pt}{0.7857pt}{1.42264pt}{0.0pt}{1.42264pt}\pgfsys@curveto{-0.7857pt}{1.42264pt}{-1.42264pt}{0.7857pt}{-1.42264pt}{0.0pt}\pgfsys@curveto{-1.42264pt}{-0.7857pt}{-0.7857pt}{-1.42264pt}{0.0pt}{-1.42264pt}\pgfsys@curveto{0.7857pt}{-1.42264pt}{1.42264pt}{-0.7857pt}{1.42264pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@fillstroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{0.0pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{} {{}{}{{}}{}}{{}{}{{}}{}}{{}{}}{{}} {{}{}{{}}{}}{{{}}{{}}}{{}}{{}{}{{}}{}}{{{}}{{}}}{{}}{}{{}}{{{{{}}{}{}{}{}{{}}}}}{{}}{}{{{{{{}}{}{}{}{}{{}}}}}{}{}{}{}}{}{}{}{}{{}}{{}}{{}}{}{}{}{}{{}}\pgfsys@moveto{1.56734pt}{0.41997pt}\pgfsys@curveto{15.30898pt}{4.10202pt}{15.30898pt}{-4.10202pt}{2.97757pt}{-0.79784pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{-0.96593}{0.25882}{-0.25882}{-0.96593}{2.97757pt}{-0.79784pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{7.50047pt}{3.04527pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{f}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{{}{}}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{0.0pt}{-2.15277pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{ {}{}{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}, where are sources and is an edge label. Terms over this signature evaluate recursively to s-graphs , as usual in an algebra. Each instance of the HR algebra uses a fixed, finite set of source names which can be used in the constant s-graphs and the rename and forget operations. The class of graphs which can be expressed as values of terms over the algebra increases with . We write for the HR algebra with source names (and some set of edge labels).
Let be an s-graph, and let be a subgraph of , i.e. a subset of its edges. We call a node a boundary node of if it is incident both to an edge in and to an edge that is not in . For instance, the s-graph in Fig. 2e is a subgraph of the one in Fig. 2d; the boundary nodes are drawn shaded in (d). The following lemma holds:
Lemma 2*.*
Let be an s-graph, and let be a subgraph of such that the s-graph contains the same edges as . Then every boundary node in is a source in .
3.4 Grammars with semantic interpretations
Finally, we extend string grammars to compositionally relate strings with semantic representations. Let be a string grammar. The tree grammar generates a language of trees. We will map each tree into a term over some algebra over a signature using a linear tree homomorphism (LTH) Comon et al. (2007), i.e. by compositional bottom-up evaluation. This defines a relation between strings and values of :
[TABLE]
For instance, could be some HR algebra ; then will be a binary relation between strings and s-graphs. In this case, we abbreviate as .
If we look at an entire class of string grammars and a fixed algebra, this defines a class of such relations:
[TABLE]
In the example in Fig. 2, we can define a linear homomorphism to map the derivation tree in (b) to a term which evaluates to the s-graph shown in (d). At the top of this term, the s-graphs at the “chind” and “hälfed” (f,g) nodes are combined into (d) by :
[TABLE]
This non-projective derivation produces the s-graph in (d) using only two sources, and . By contrast, a homomorphic interpretation of the projective tree (c) has to use at least four sources, as the intermediate result in (e) illustrates.
4 Projective cross-serial dependencies
We will now investigate the ability of projective grammar formalisms to express . We will define a relation and prove that cannot be generated by projective grammar formalisms with bounded . We show this first for arbitrary projective , under certain assumptions on the alignment of words and graph edges. In Section 5, we drop these assumptions, but focus on .
4.1 The relation
To construct , consider the string language , where
[TABLE]
and analogously for . An example string in is Note that can be chosen independently for each segment.
Every string can be uniquely described by , , and a sequence of numbers specifying the ’s used in each segment, where each contain numbers and contain numbers. In the example, we have , , and .
We associate a graph with each string by the construction illustrated in Fig. 3. For each , we define the -th -block to be the graph consisting of nodes with a further outgoing -edge from . In addition, connects to a linear chain of edges with label , and to a linear chain of -edges. consists of a linear chain of the -blocks, followed by the -blocks (defined analogously). We let .
Note that is a more intricate version of the cross-serial dependency language. can be generated by a TAG grammar along the lines of the one from Section 3.4, using a HR algebra with two sources; thus .
4.2 with bounded blocks
The characteristic feature of is that edges which are close together in the graph (e.g. the and edge in an -block) correspond to symbols that can be distant in the string (e.g. and tokens). Projective grammars cannot combine predicates () and arguments () directly because of their distance in the string; intuitively, they must keep track of either the ’s or the ’s for a long time, which cannot be done with a bounded .
Before we go into exploiting this intuition, we first note that its correctness depends on the details of the construction of , in particular the ability to select arbitrary and independent for the different . Consider the derivation on the left of Fig. 4 with its projective yield ; this is the case of , corresponding to the graph shown in Fig. 4 (a). We can map to this graph by applying the following linear tree homomorphism into :
[TABLE]
A derivation of the form evaluates to the same graph as ; the graph value of is ignored. Thus if we assume that the subtree of for evaluates to some arbitrary graph , the complete derivation evaluates to . Some intermediate results are shown on the right of Fig. 4.
If we let be the subset of where all are zero, we can generalize this construction into an LM-CFTG which generates . Thus, can be generated by a projective grammar that is interpreted into . But note that the derivation in Fig. 4 is unnatural in that the symbols in the string are not generated by the same derivation steps that generate the graph nodes that intuitively correspond to them; for instance, the graphs generated for the tokens are completely irrelevant. Below, we prevent unnatural constructions like this in two ways. We will first assume that string symbols and graph nodes must be aligned (Thm. 1). Then we will assume that the can be arbitary, which allows us to drop the alignment assumption (Thm. 2).
4.3 -distant trees
Let be some relation containing at least the string-graph pairs of , e.g. itself. Assume that is generated by a projective grammar with and a fixed number of sources, i.e. we have . We will prove a contradiction.
Given a pair , we say that two edges in are equivalent, , if they belong to the same block. We call a derivation tree -distant if has a subtree such that we can find edges with for all and further edges such that for all . For such trees, we have the following lemma.
Lemma 3*.*
A -distant tree has a subtree such that has at least sources.
Proof 1*.*
Let be the -th block in ; we let and do not distinguish between - and -blocks. Let be the subtree of claimed by the definition of distant trees. For each , let be the edges in the -th block generated by , and let .
By definition, and are both non-empty for at least blocks. Each of these blocks is weakly connected, and thus contains at least one node which is incident both to an edge in and in . This node is a boundary node of . Because are all distinct, it follows from Lemma 2 that has at least sources.
We also note the following lemma about derivations of projective string grammars, which follows from the inability of projective grammars to combine distant tokens. We write .
Lemma 4*.*
Let be a projective string grammar. For any there exists such that any with has a subtree such that contains occurrences of and no occurrences of , for some .
4.4 Projectivity and alignments
A consequence of Lemma 3 is that if certain string-graph pairs in can only be expressed with -distant trees, then (which contains these pairs as well) is not in , because only admits sources.
However, as we saw in Section 4.2, pairs in can have unexpected projective derivations which make do with a low number of sources. So let’s assume for now that the string grammar and the tree homomorphism produce tokens and edge labels that fit together. Let us call aligned if for all constants , is a graph containing a single edge with label . The derivation in Fig. 4 cannot be generated by an aligned grammar because the graph for the token contains a -edge. We write {\mathcal{L}}_{\leftrightarrow}({\mathbb{G}},\mathcal{A})=\{\mathcal{R}\mathcal{E}\mathcal{L}({\mathcal{G}},h,\mathcal{A})\mid\mbox{{\mathcal{G}}\in{\mathbb{G}}{\mathcal{G}},h aligned}\} for the class of string-semantics relations which can be generated with aligned grammars.
Under this assumption, it is easy to see that any relation including (hence, ) cannot be expressed with a projective grammar.
Theorem 1*.*
Let be any class of projective string grammars and . For any , .
Proof 2*.*
Assume that there is a and an LTH such that . Given , choose such that every tree with has a subtree such that contains occurrences of and no occurrences of , for some . Such an exists according to Lemma 4. We can choose such that .
Because are aligned, contains no -edge and at least -edges. Each of these -edges is non-equivalent to all the others, and equivalent to a -edge in , so is -distant. It follows from Lemma 3 that has sources, in contradiction to the assumption that uses only sources.
5 Expressive capacity of LM-CFTG
Thm. 1 is a powerful result which shows that cannot be generated by any device for generating projective derivations using bounded placeholder memory – if we can assume that tokens and edges are aligned. We will now drop this assumption and prove that cannot be generated using a fixed set of placeholders using LM-CFTG, regardless of alignment. The basic proof idea is to enforce a weak form of alignment through the interaction of the pumping lemma with very long -chains. The result is remarkable in that LM-CFTG and TAG are weakly equivalent; they only differ in whether they must derive the strings projectively or not.
Theorem 2*.*
, for any .
5.1 Asynchronous derivations
Assume that , for some , with an LM-CFTG. Proving that this is a contradiction hinges on a somewhat technical concept of asynchronous derivations, which have to do with how the nodes generating edge labels such as are distributed over a derivation tree. We prove that all asynchronous derivations of certain elements of are distant (Lemma 5), and that all LM-CFTG derivations of are asynchronous (Lemma 6), which proves Thm. 2.
In what follows, Let . let us write for any tree or context and symbol , as a shorthand for , for the number of -edges in and for the maximum length of a string in which is also substring of .
Definition 1* (-asynchronous derivation).*
*Let , , , We call an -asynchronous derivation iff there is a decomposition such that *
[TABLE]
We call the pair an -asynchronous split of .
Lemma 5*.*
For any , there is a pair such that every -asynchronous with is -distant.
Proof 3*.*
For and , let denote the word . Let and be the unique element of such that
[TABLE]
Let be an -asynchronous derivation of ; other choices of are analogous. By definition, we can split such that has at most -edges and at least -edges. Notice first that contains at most different complete -blocks of , because each -block contains -edges. Having of them would require -edges, which is more than the -edges that can contain.
Next, consider distinct -blocks of . These blocks contain a total of -edges. Hence, the -edges of cannot be contained within only distinct blocks.
So we can find at least -edges in which are pairwise non-equivalent. There are at least edges among these which are equivalent to an edge in , because contains at most complete -blocks of . Thus, is -distant.
5.2 LM-CFTG derivations are asynchronous
So far, we have not used the assumption that is an LM-CFTL. We will now exploit the pumping lemma to show that all derivation trees of an LM-CFTG for must be asynchronous.
Lemma 6*.*
If is an LM-CFTL, then there exists such that for every , there exists such that is -asynchronous.
We prove this lemma by appealing to a class of derivation trees in which predicate and argument tokens are generated in separate parts.
Definition 2* (-separated derivation).*
Let . A tree is -separated if we can write such that and and . The triple is called an -separation of . We call an -separation minimal if there is no other -separation of with a smaller .
Intuitively, we can use the pumping lemma to systematically remove some contexts from a . From the shape of , we can conclude certain alignments between the strings and graphs generated by these contexts and establish bounds on the number of - and -edges generated by the lower part of a separated derivation. The full proof is in the appendix; we sketch the main ideas here.
Let denote the pumping height of . There is a maximal number of string tokens and edges that a context of height at most can generate under a given yield and homomorphism. We call this number in the rest of the proof.
Lemma 7*.*
For , let be the length of the maximal substring of consisting in only -tokens and containing the rightmost occurrence of in . If is -separated, there exists a minimal -separation of such that, letting , .
Moreover, for any -separation , letting , .
Proof 4* (sketch).*
Both statements must be achieved in separated inductions on the height of , although they mostly follow similar steps. We therefore focus here only on the crucial parts of the (slightly trickier) bound on . Let be a minimal -separation of and .
Base Case If , we have . We also have , so .
Induction step If , we apply Lemma 1 to to yield a decomposition , where , and . We first observe that is -separated. By induction, there exists a minimal separation with validating the bound on . Because of pumping considerations, we need to distinguish only three configurations of and . We present only the most difficult case here.
*In this case and generate only one kind of bar symbol, , and brackets. One needs to examine all possible ways , and may overlap. We detail the reasoning in the case where does not overlap with or . Then, since all -tokens are generated by , projectivity of the yield and the definition of impose that the generated -tokens contribute to the rightmost -chain i.e. . Hence . *
Lemma 8*.*
For any , if is -separated then is -asynchronous.
Proof 5*.*
By Lemma 7 there is a minimal -separation such that, for , the bound on and the bound on both obtain. Observe that by definition, and since generates at most -tokens, by projectivity it generates at most -tokens (one sequence of between each occurrence of and the next, plus possibly one before the first and one after the last). Thus is -asynchronous.
Lemma 9*.*
For any , is -separated for some .
Proof 6* (sketch).*
The proof proceeds by induction on the height of .
If , then for any , hence is trivially -separated for some .
If , Lemma 1 yields a decomposition , where , and . By induction is -separated for some . Let us assume , other cases are analoguous. The challenge is to conclude to the separation of , after reinsertion of and in .
If and generate no - or -token, the distribution of - and -tokens in the tree is not affected, hence is -separated. Otherwise, due to pumping considerations, we need to distinguish three possible configurations regarding the shape of the yields of and . We present one here, see the appendix for the others; they are in the same spirit.
We consider the case where contains some -token and no -tokens, and contains some -token. Assume contains some . It follows that all -tokens are generated by . So has less than -tokens, by definition of it has then also less than -tokens, so is a -separation. Assume now that contains some . It follows that generate no -token and generate no -token. Hence is a -separation.
This concludes the proof of Lemma 6 and Thm. 2.
6 Conclusion
We have established a notion of expressive capacity in compositional semantic parsing. We have proved that non-projective grammars can express sentence-meaning relations with bounded memory that projective ones cannot. This answers an old question in the design of compositional systems: assuming projective syntax, lambda-style compositional mechanisms can be more expressive than unification-style ones, which have bounded “memory” for unfilled arguments.
From a theoretical perspective, the stronger result of this paper is perhaps Thm. 2, which shows without further assumptions that weakly equivalent grammar formalisms can differ in their semantic expressive capacity. However, Thm. 1 may have a clearer practical impact on the development of compositional semantic parsers. Consider, for instance, the case of CCG, a lexicalized grammar formalism that has been widely used in semantic parsing Bos (2008); Artzi et al. (2015); Lewis et al. (2016). While a potentially infinite set of syntactic categories can be used in the parses of a single CCG grammar, CCG derivations are still projective in our sense. Thus, if one assumes that derivations should be aligned (which is natural for a lexicalized grammar), Thm. 1 implies that CCG with lambda-style semantic composition is more semantically expressive than with unification-style composition. Indeed, lambda-style compositional mechanisms are the dominant approach in CCG Steedman (2001); Baldridge and Kruijff (2002); Artzi et al. (2015).
Furthermore, under the alignment assumptions of Section 4, no unification-style compositional mechanism can describe string-meaning relations like . This includes neural models. For instance, most transition-based parsers Nivre (2008); Andor et al. (2016); Dyer et al. (2016) are projective, in that the parsing operations can only concatenate two substrings on the top of the stack if they are adjacent in the string. Such transition systems can therefore not be extended to transition-based semantic parsers Damonte et al. (2017) without (a) losing expressive capacity, (b) giving up compositionality, (c) adding mechanisms for non-projectivity Gómez-Rodríguez et al. (2018), or (d) using a lambda-style semantic algebra. Thus our result clarifies how to build an effective and accurate semantic parser.
We have focused on whether a grammar formalism is projective or not, while holding the semantic algebra fixed. In the future, it would be interesting to explore how a unification-style compositional mechanism can be converted to a lambda-style mechanism with unbounded placeholders. This would allow us to specify and train semantic parsers using such abstractions, while benefiting from the efficiency of projective parsers.
Acknowledgments
We are grateful to Emily Bender, Guy Emerson, Meaghan Fowlie, Jonas Groschwitz, and the participants of the DELPH-IN workshop 2018 for fruitful discussions, and to the anonymous reviewers for their insightful feedback.
Appendix A Details of the proof of Theorem 1
Lemma 4*.*
Let be a projective string grammar. For any there exists such that any with has a subtree such that contains occurrences of and no occurrences of , for some .
Proof 7*.*
Depending on , one can always choose such that any with has at least one strict subtree with .
The lemma follows by induction over the height of . It is trivially true for height 1. For the induction step, consider that must have at least occurrences of some letter because of projectivity and the shape of ; assume it is , the other cases are analogous. If has no occurrences of , we are done. Otherwise, by projectivity, contains all the ’s, i.e. . In this case, either contains occurrences of and no occurrences of , in which case we are again done. Or it contains an occurrence of ; then is in the shape required by the lemma, and we can apply the induction hypothesis to identify a subtree of with occurrences of some and none of the corresponding ; and is also a subtree of .
Appendix B Details of the proof of Theorem 2
In all of the following, we assume that for some we have , where is an LM-CFTG (hence projective, i.e. ). We let and be the pumping height of .
B.1 Terminology
Let us extend the domain of to contexts: for a context , we let .
We say that a string is balanced if, for any and any position in such that there are two encompassing positions such that . We say that a tree or a context is balanced if its yield is balanced. By construction, all trees of are balanced.
For , a pumping decomposition of is a -tuple , consisting in contexts - and one tree such that , , and for any , , where we let and
B.2 Pumping considerations
Lemma 10*.*
Let with , and consider a pumping decomposition . Let . The two following propositions obtain:
- •
For any , .
- •
Let . For and let denote the number of -edges in . It holds for any that .
Proof 8*.*
Let . so which entails
[TABLE]
since by construction, we have . From there
[TABLE]
But . Plugging this into (1) yields
[TABLE]
Simplifying using (2) we find which establishes the first point. For the second point, we have from and by definition of . Similarily since we have . Hence .
We will now present a pair of lemmas stating, in formal terms, that decompositions provided by the pumping lemma all fall within a small number of configurations:
- •
First, in the case where the ’pumpable’ contexts and generate only ‘bar’ tokens and brackets in , we show that , so that only is actually pumping ‘bar’ tokens of some kind. Moreover, generates only ’bar’ tokens and brackets as well.
- •
Second, we explore the alternative, where the ’pumpable’ contexts generate some of the ‘core’ tokens in , say – for the sake of this informal presentation – some -tokens. By lemma 10, they must generate as many -tokens, for which we can again distinguish three possible configurations: 1. ’s and ’s are respectively generated on different sides of a single context ( and/or ), but then neither nor generate any or -tokens. 2. generate both and -tokens (on the left and right sides respectively) and no and -tokens, while ensures generation of corresponding and -tokens (on the left and right sides respectively). 3. Or else, one of generates the -tokens and no , or while the other generates the corresponding -tokens and no , or .
Below follows the formal presentation of these lemmas:
Lemma 11*.*
Let with , and consider a pumping decomposition such that for all , . There is such that all of the following holds:
* and .* 2. 2.
Either and for any , or symmetrically, and for any . 3. 3.
* for any *
Proof 9*.*
First point: Let and . Let and assume . Pumping - -times yields a tree such that is a prefix of . We thus see that is not balanced, which is in contradiction with . A symmetric argument establishes that .
Assume now that there are two distinct such that and . Notice that, since does not contain non-bar tokens, if and occur on the same side of (for instance ) then because no string in admits as a substring, whereas does. So and must occur on distinct sides. It follows that does not generate tokens in either: if for instance for some strings and in , would be a substring of which again is a contradiction. Let now . Pumping - times yields a tree with a substring of the form (up to symmetry) which cannot be balanced, yielding a final contradiction.
Second point: , because otherwise pumping and more times than the maximum number of occurrences of a bar token in would yield an unbalanced tree. So there is a such that or . Assume for contradiction that any different token occurs on the same side of then contains a substring that cannot be found in any string of yielding a contradiction. So or . Assume , the other case is symmetric. Assume for contradiction that for some . Let . Pumping - times yields a tree such that (by projectivity) has a substring of the form where . Hence is not balanced, yielding a contradiction.
Third point: Assume for contradiction that . Assume that , the case is symmetric, and point 2 ensures that these two cases are exhaustive. Let and consider the tree obtained by pumping - times. By projectivity, has a substring of the form with and . Hence is not balanced and , yielding a contradiction.
Lemma 12*.*
let with , and consider a pumping decomposition . Let such that . One of the following obtains:
For some , , , and . 2. 2.
, , and . 3. 3.
Either , and , or symmetrically , and .
Proof 10*.*
All these observations follow easily from the first point of Lemma 10 (governing the relative number of occurrences of -tokens on one hand and -tokens on the other hand), projectivity, and the following observation: only one side of or cannot generate two different kinds of tokens in or be unbalanced. Otherwise pumping would (from projectivity) ensure that the resulting tree has a substring of a shape impossible for (for example, if both and -tokens occur on the same side of , pumping once produces a substring ).
B.3 Separation
Lemma 13*.*
Let and . If is -separated then so is .
Proof 11*.*
Consider an -separation of : . Let and be respectively obtained by removing all nodes from or from , and . One easily checks that .
Moreover, , and . Hence is -separated.
B.4 Minimality argument
Lemma 14*.*
Let and such that is -separated. By Lemma 13, is separated. Let be a minimal separation of and be a minimal separation of . contains all nodes of .
Proof 12*.*
Assume for contradiction that a node of is not in . It must then be in or . Assume that it is in , the case where it is in is analoguous. Since is not in , there is a non-trivial subcontext of rooted at , i.e. with . Let be obtained by removing all nodes from or from and respectively. By definition of , , hence . Further observe that we have for some subcontext of . Since is not in or , thus . But letting , is then an -separation of which contredicts the assumed minimality of .
B.5 Inductive bounds
For any tree or context and symbol , let us write as a shorthand for , for the number of -edges generated by and the length of the rightmost maximal substring of consisting in only -tokens (more formally, , where is the unique substring such that where , if is non empty its last token is not , and ).
There is a maximal number of string tokens and edges that a context of height at most can generate under the considered yield and homomorphism. We call this number and focus from now on -separated and -asynchronous derivations.
Below are the proofs of the two statements of Lemma 7 of the main paper (respectively, 7-1 and 7-2).
Lemma 7-1*.*
If is -separated and is an -separation of , then for we have
[TABLE]
Proof 13*.*
We prove the result by induction over the pair (with lexicographic ordering).
Base Case Assume . Then . Since , , thus which ensures the bound.
Induction step If then . We apply Lemma 1 to to yield a decomposition , where , and . Notice that cannot overlap with without overlapping with or as well, for otherwise .
As in the proof of Lemma 13, letting be obtained by removing all nodes from and from , and respectively, we obtain an -separation . We let and distinguish between possible configurations for and :
Case 0 If neither or generate any -token, we find by induction
[TABLE]
Moreover, we have , and which concludes.
Case 1 In this case Lemma 11 applies i.e. and generate only some -tokens and brackets. The only subcase not already covered by Case 0 is the one where . Notice that . By induction,
[TABLE]
If does not overlap with or , we have and which ensures the bound. Otherwise overlaps with . If all nodes of are contained in , then by Lemma 11, generate no -token. By separation, neither does which contradicts . Hence contains all nodes of . Then by lemma 11 again, , hence and which yields
[TABLE]
Case 2 In this case Lemma 12 applies and at least one of - generate some token . The only subcase not already dealt with in Case 0 is the one where we can set . We thus get inductively:
[TABLE]
Since or generate at least some -token, we have . Moreover since generate at most -edges, and . So we have concluding the proof.
Lemma 7-2*.*
If is -separated then there exists a minimal -separation of is such that, letting , we have
[TABLE]
Proof 14*.*
We prove the result by induction over the height of .
* is -separated so let us consider a minimal -separation of . Let .*
Base Case Assume . Then. . Since , . Moreover, and . So which ensures the bound.
Induction step If , we apply Lemma 1 to to yield a decomposition , where , and . By Lemma 13, is -separated. We let be a minimal separation of verifying the bound and . In other words, we have:
[TABLE]
By Lemma 14, contains all nodes of . We distinguish cases according to Lemmas 11 and 12.
Case 1 Consider first the case where Lemma 11 applies i.e. and generate only one kind of bar token, , and brackets. We now distinguish cases depending on the value of . Before this, we emphasize that in all subcases it holds that .
subcase i) . Since all nodes of are contained in , we have . Since and generate no -token, we have and . Injecting into inequation (3) concludes.
subcase ii) . We distinguish the different possible overlap of and with . Notice first that, by minimality, if any , overlaps with then contains all nodes of , for otherwise we would have with a subcontext of such that , and in that case would be a smaller -separation of since (hence ) does not generate -tokens.
Hence, in the case where overlaps with , contains all nodes of and . Since also contains all nodes of , . Moreover, . We can then conclude using inequation 3.
Consider now the case where does not overlap with or . Since all -tokens are generated by , projectivity of the yield and the definition of impose that . We further have , and injecting into inequation 3 yields which simplifies into the desired bound.
Finally, in the case where does not overlap with but does, all nodes of are contained in and all nodes of are contained in . We must then have . Otherwise, there would exist an -separation with , and . Assume . Lemma 11, point 2, ensures that or . Assume (the other case is symmetric). We then have both a and a generated on the left of . Since neither nor generate any -token, projectivity imposes and we can conclude as in the previous case. The only remaining subcase is when , in which case is [math]-separated, and considering the (minimal) [math]-separation we can use the same argument as in the case where encompasses all nodes of and .
Case 2 Consider now the remaining case where Lemma 12 applies. If neither or generate some or -token, they don’t generate or -tokens either, and the same reasoning as Case 1 subcase i) applies. Otherwise generate at least some -token. We then have . Since contains all nodes from we further have . Finally . We conclude using inequation 3.
B.6 Conclusion
Lemma 8*.*
For any , if is -separated then is -asynchronous.
Proof 15*.*
By Lemma 7-2, there is a minimal -separation such that the bound obtains for . By lemma 7-1 the bound obtains for as well. Observe finally, that and since generates at most -tokens, by projectivity and definition of , it generates at most -tokens (one sequence of between each occurrence of and the next, plus possibly one in front of the first and one after the last). Hence,
[TABLE]
and is -asynchronous.
Lemma 9*.*
For any , is -separated for some .
Proof 16*.*
The proof proceeds by induction on the height of .
If . Then for any , hence is trivially -separated for some .
If , Lemma 1 yields a decomposition , where , and . By induction is -separated for some . For sake of succintness, let us present the inductive step for , the reasoning for other cases is analoguous. Let us examine the different possible configurations of and .
Case 1 If Lemma 11 applies i.e. and generate only one kind of bar token, , and brackets, one checks easily that inserting and does not change the distribution of and -tokens in the tree, hence is -separated.
Case 2 If Lemma 12 applies, note first that if and generate no or -token, we can conclude as in Case 1 as the distribution of and -tokens in the tree is not changed either. Otherwise, we assume that or generate some or -token and distinguish between subcases 1-3 of Lemma 12:
Subcase 1 in this case for some contains an -token and no or -token while contains some -token and no or -token. Assume , the case where is similar. By projectivity and definition of follows that all -tokens are generated in and all -tokens in . is therefore -separated, hence -separated.
Subcase 2 in this case, contains some -token and no -token, contains some -token and no -token, contains some -token and no -token, contains some -token and no -token. It follows that generate no occurrence of and no occurrence of . Since , is an -separation.
Subcase 3 Assume contains some -token and no -token and that contains some -token. It follows that all -tokens are generated by . So contains less than -tokens, by definition of it also contains less than -tokens, so is a -separation.
Assume now contains some -token and no -token and that contains some -token. It follows that generate no -token and generate no -token. Hence is -separation.
The remaining cases are symmetric exchanging with , with , and with everywhere.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Andor et al. (2016) Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro Presta, Kuzman Ganchev, Slav Petrov, and Michael Collins. 2016. Globally normalized transition-based neural networks. In Proceedings of ACL .
- 2Artzi et al. (2015) Yoav Artzi, Kenton Lee, and Luke Zettlemoyer. 2015. Broad-coverage CCG Semantic Parsing with AMR. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing .
- 3Baldridge and Kruijff (2002) Jason Baldridge and Geert-Jan M. Kruijff. 2002. Coupling CCG and Hybrid Logic Dependency Semantics. In Proceedings of the 40th ACL .
- 4Banarescu et al. (2013) Laura Banarescu, Claire Bonial, Shu Cai, Madalina Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin Knight, Philipp Koehn, Martha Palmer, and Nathan Schneider. 2013. Abstract Meaning Representation for Sembanking. In Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse .
- 5Bender (2008) Emily M. Bender. 2008. Radical non-configurationality without shuffle operators: An analysis of Wambaya. In Proceedings of the 15th International Conference on HPSG .
- 6Bender et al. (2002) Emily M. Bender, Dan Flickinger, and Stephan Oepen. 2002. The Grammar Matrix: An open-source starter-kit for the rapid development of cross-linguistically consistent broad-coverage precision grammars. In Proceedings of the COLING Workshop on Grammar Engineering and Evaluation .
- 7Blackburn and Bos (2005) Patrick Blackburn and Johan Bos. 2005. Representation and Inference for Natural Language . CSLI Publications.
- 8Bos (2008) Johan Bos. 2008. Wide-coverage semantic analysis with Boxer. In Semantics in Text Processing. STEP 2008 Conference Proceedings . College Publications.
