Learning with Partially Ordered Representations
Jane Chandlee, Remi Eyraud, Jeffrey Heinz, Adam Jardine, Jonathan, Rawski

TL;DR
This paper introduces a novel approach to grammar learning using partially ordered string representations, enabling more flexible modeling of shared properties at string positions and improving learning efficiency.
Contribution
It presents a new model-theoretic framework for grammars with shared, multi-property positions and an algorithm that efficiently learns the most general grammar from positive examples.
Findings
Structures are shown to be partially ordered.
The learning algorithm effectively prunes the hypothesis space.
It finds the most general grammar covering the data.
Abstract
This paper examines the characterization and learning of grammars defined with enriched representational models. Model-theoretic approaches to formal language theory traditionally assume that each position in a string belongs to exactly one unary relation. We consider unconventional string models where positions can have multiple, shared properties, which are arguably useful in many applications. We show the structures given by these models are partially ordered, and present a learning algorithm that exploits this ordering relation to effectively prune the hypothesis space. We prove this learning algorithm, which takes positive examples as input, finds the most general grammar which covers the data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Learning with Partially Ordered Representations
Jane Chandlee
Tri-Co Department of Linguistics
Haverford College
&Rémi Eyraud
QARMA team, LIS
Aix-Marseille University
\ANDJeffrey Heinz
Department of Linguistics
Institute for Advanced Computational Science
Stony Brook University
&Adam Jardine
Department of Linguistics
Rutgers University
\ANDJonathan Rawski
Department of Linguistics
Institute for Advanced Computational Science
Stony Brook University
Abstract
This paper examines the characterization and learning of grammars defined with enriched representational models. Model-theoretic approaches to formal language theory traditionally assume that each position in a string belongs to exactly one unary relation. We consider unconventional string models where positions can have multiple, shared properties, which are arguably useful in many applications. We show the structures given by these models are partially ordered, and present a learning algorithm that exploits this ordering relation to effectively prune the hypothesis space. We prove this learning algorithm, which takes positive examples as input, finds the most general grammar which covers the data.
1 Introduction
Foundational connections between formal languages, finite-state automata, and logic have been known for decades (Büchi, 1960; Thomas, 1997). Logical approaches are advantageous since they flexibly admit different representations. In many domains, such as biological sequencing or linguistics, shared properties of symbols in sequences provide information currently ignored by string-based inference algorithms, which largely focus on learning automata (de la Higuera, 2010). Here we explore the idea that domain-specific knowledge can be encoded representationally via model theory (Libkin, 2004), and shows how these representations can facilitate pattern learning.
This paper synthesizes results in grammatical inference and model theory to present a novel algorithm which learns classes of formal languages using enriched representations of strings. In fact, our model-theoretic approach immediately generalizes these results to arbitrary data structures. Here we are concerned with the learning of those formal languages which can be defined via a set of structural constraints, such as the Strictly -Local and Strictly -Piecewise languages (Rogers and Pullum, 2011; Rogers et al., 2010). Models of strings in the languages must not contain these forbidden structures (Rogers et al., 2013). Specifically, we define a learner whose hypothesis space is structured as a partial order by the relational signature of the particular model theory. We show how to traverse this space bottom-up from positive data to find a grammar which covers the data with the most general constraints.
The paper is structured as follows: Section 2 provides mathematical preliminaries in model theory. Section 3 characterizes ordering relations over these structures. Section 4 generalizes the grammars employed in string extension and lattice-based learning (Heinz, 2010; Heinz et al., 2012) to show how these model theoretic structures can define classes of formal languages. Section 5 discusses some entailments our learning algorithm takes advantage of. Section 6 defines a learning problem and criteria for selecting adequate solutions. Section 7 presents a general-to-specific, bottom-up algorithm which provably satisfies the learning criteria. Section 8 concludes the paper.
2 Preliminaries
2.1 Elements of Language Theory
The set of all possible finite strings of symbols from a finite alphabet and the set of strings of length are and , respectively. The unique empty string is represented with . The length of a string is , so 0. If and are two strings then we denote their concatenation with . If is a string and is the th symbol in then , so .
The set of prefixes of , , is , the set of suffixes of , , is , the set of substrings, , is , and the set of subsequences,
2.2 Elements of Finite Model Theory
Model theory, combined with logic, provides a powerful way to study and understand mathematical objects with structures (Enderton, 2001). In this paper we only consider finite relational models (Libkin, 2004) of strings in .
Definition 1** (Models).**
A model signature is a tuple where the domain is a finite set, and each is a -ary relation over the domain. A model for a set of objects is a total, one-to-one function from to structures whose type is given by a model signature.
For example, a conventional model for strings in is given by the signature and the function such that where is the domain, is the successor relation which orders the elements of the domain, and is a set of unary relations such that for each , . We will usually omit the superscript since it will be clear from the context.
For example, with and the model above for strings, we have M^{\lhd}(abba)=\big{\langle}D=\{1,2,3,4\};\lhd=\{(1,2),(2,3),(3,4)\},R_{a}=\{1,4\},R_{b}=\{2,3\},R_{c}=\emptyset\big{\rangle}~{}.
Figure 1 illustrates on the left.
Another conventional model is the precedence model, with the signature . It differs from the successor model only in that the order relation is defined with general precedence (Büchi, 1960; McNaughton and Papert, 1971; Rogers et al., 2013). Under this signature, the string has the following model.
M^{<}(abba)=\big{\langle}D=\{1,2,3,4\};<=\{(1,2),(1,3),(1,4),(2,3),(2,4),(3,4)\},R_{a}=\{1,4\},R_{b}=\{2,3\},R_{c}=\emptyset\big{\rangle}.
Figure 1 illustrates on the right.
The model-theoretic framework is not unique to strings. It extends to arbitrary data structures by expanding parts of the model signature. For example, Rogers (2003) describes a model-theoretic characterization of trees of arbitrary dimensionality where the domain is a Gorn tree domain Gorn (1967). This is a hereditarily prefix closed set D of node addresses, that is to say, for every with , where , it holds that , and for every with then .
In this view, a string may be called a one-dimensional or unary-branching tree, since it has one axis along which its nodes are ordered. In a standard tree mdoel signature, the set of nodes is ordered by two binary relations, dominance" and immediate left-of". Suppose is the mother of two nodes and in some standard tree, and also assume that precedes . Then we might say that dominates the string . Standard or two-dimensional trees, then, relate nodes to one-dimensional trees (strings) by immediate dominance. A three-dimensional tree relates nodes to two-dimensional, i.e. standard trees, corresponding to Tree-Adjoining Grammar derivations. In general, a -dimensional tree is a set of nodes ordered by dominance relations such that the -th dominance relation relates nodes to -dimensional trees (for , single nodes are zero-dimensional trees).
While a Gorn tree domain as written encodes these dominance and precedence relations implicitly, we may explicitly write them out model-theoretically so that a signature for a -labeled 2- tree is where is the immediate dominance" relation and $\prec$ is the immediate left-of" relation. Model signatures that include transitive closure relations of each of these have also been studied.
2.3 Unconventional Word Models
Whereas Rogers (2003) generalized conventional word models to trees, here we generalize word models in a different way. Conventional string models are the successor and precedence models introduced previously. What makes these models conventional is the unary relations which essentially label each domain element with a single, mutually exclusive, property: the property of being some .
In contrast, unconventional models for strings recognize that distinct alphabetic symbols may share properties, and expands the model signature by including these properties as unary relations (Strother-Garcia et al., 2016; Vu et al., 2018). For example, a conventional model of would include 52 unary relations, one for each lowercase and capital letter. On the other hand, an unconventional model might only include 27: 26 for the letters, and one unary relation Capital. Then, letters A and a share the `A' property and A additionally has the property of being Capital.
In linguistics, speech sounds are commonly decomposed into binary features based on their phonetic properties. So the set of segments z,Z,d,b,g,… all share the property +Voice, meaning the vocal cords are activated, while the segments s,S,t,p,k,… share the property -Voice, meaning the vocal cords are not activated. Thus unconventional models may refer to individual features in defining grammatical constraints, rather than each individual segment.
Different representations of strings and trees provide a unified perspective on well-known subclasses of the regular languages from a model-theoretic and logical perspective (Thomas, 1997; Rogers et al., 2013). However, they also open up new doors for grammatical inference by allowing one to consider other models for strings (Strother-Garcia et al., 2016; Vu et al., 2018).
3 Subfactors, Superfactors, Ideals and Filters
We sometimes refer to the model of a string as a structure. However, structures are more general in that they correspond to any mathematical structure conforming to the model signature. As such, while a model of a string will always be a structure, a structure will not always be a model of a string . The size of a structure , denoted , coincides with the cardinality of its domain.
We next wish to introduce a partial ordering over structures. To do so, we must define the terms connected, restriction, and factor. For each structure let the binary ``connectedness'' relation be defined as follows.
C\overset{def}{=}\big{\{}(x,y)\in D\times D\mid\exists i\in\{1\ldots n\},\exists(x_{1}\ldots x_{m})\in R_{i},\exists s,t\in\{1\ldots m\},x=x_{s},y=x_{t}\big{\}}
Informally, domain elements and belong to provided they belong to some non-unary relation. Let denote the symmetric transitive closure of .
Definition 2** (Connected structure).**
A structure is connected iff for all , .
For example, above is a connected structure. However, the structure shown below which is identical to except it omits the pair (2,3) from the order relation is not connected since none of (1,3),(1,4), (2,3) nor (2,4) belong to . S_{ab,~{}ba}=\big{\langle}D=\{1,2,3,4\};\lhd=\{(1,2),(3,4)\},R_{a}=\{1,4\},R_{b}=\{2,3\},R_{c}=\emptyset\big{\rangle}
1a2b3b4a\triangleleft$$\triangleleft
Note that no string in has structure as its model.
Definition 3**.**
* is a restriction of iff and for each -ary relation , we have .*
Informally, one identifies a subset of the domain of and strips of all elements and relations which are not wholly within . What is left is a restriction of to .
Definition 4**.**
Structure is a subfactor of structure () if is connected, there exists a restriction of denoted , and there exists such that for all and for all in the model signature: if and holds in then holds in . If we also say that is a superfactor of .
In other words, properties that hold of the connected structure also hold in a related way within .
If and then we say is a -subfactor of . For all , and for any model of , let the subfactors of be and the -subfactors of be . We also define to be and to be . When is understood from context, we write instead of . We define the sets of superfactors and similarly.
Observe that is a partially ordered set (poset). The next definition and lemma establishes that models of strings are principal elements of ideals and filters.
Definition 5** (Ideals).**
A subset of a poset is an Ideal if
- •
* is non-empty*
- •
for every in , implies that is in
- •
for every in , there exists some element in , such that and .
The dual of an ideal is a filter.
Definition 6** (Filters).**
A subset of a poset is a filter iff
- •
* is non-empty*
- •
for every in , implies that is in
- •
for every in , there exist some element in , such that and .
Definition 7** (Principal Ideals, Filters and Elements).**
For any poset , the smallest filter containing is a principal filter and is the principal element of this filter. Similarly, the smallest ideal containing is a principal ideal and is the principal element of this ideal.
Remark 1**.**
Given a model of and , is a principal ideal in whose principal element is . is a principal filter in whose principal element is . The empty structure is a subfactor of every structure in .
The next two propositions show how this representational perspective unifies the treatment of substrings and subsequences. They are subfactors under the successor and precedence models, respectively. A string is a substring of iff there exists such that . String is a subsequence of iff there exists such that .
Proposition 1** (Substrings are subfactors under ).**
For all strings , is a substring of iff .
Proof.
Note that the result trivially holds for : we restrict ourselves to the case . Let and
(). Suppose is a substring of : it exists such that . This implies that, for all , , iff . Thus, if we set the isomorphism to be such that for , we have that is a restriction of , and therefore by definition.
(). Let be the sequence of letters and suppose : there exists a isomorphism such that is a restriction of . This means that and for all : (Definition 3). This implies that . Given that , we have and thus there exist and in such that . ∎
Proposition 2** (Subsequences are subfactors under ).**
For all strings , is a subsequence of iff .
Proof.
We leave this proof to the Reader since it is of similar nature to the previous one. ∎
4 Grammars, Languages, and Language Classes
Factors can define grammars, formal languages, and classes of formal languages. Usually a model signature provides the vocabulary for some logical language. Sentences in this logical language define sets of strings as follows. The language of a sentence is all and only those strings whose models satisfy . Within the regular languages, many well-known subregular classes can be characterized logically in this way (McNaughton and Papert, 1971; Rogers and Pullum, 2011; Rogers et al., 2013; Thomas, 1997).
Intuitively, the grammars we are interested in consist of a finite list of forbidden subfactors, whose largest size is bounded by . Strings in the language of this grammar are those which do not contain any forbidden subfactors. In this way these grammars are like logical expressions which are "conjunctions of negative literals" (Rogers et al., 2013) where the negative literals are played by the the forbidden factors.
Each forbidden subfactor is a principal element of a filter and the language is all strings whose models are not in any of these filters. For each , there is a class of languages including all and only those languages that can be defined in this way. For example, the Strictly -Local (SLk) and Strictly -Piecewise languages can be defined in this way; they are languages which forbid finitely many substrings or subsequences, respectively (Garcia et al., 1990; Rogers et al., 2010). Formally:
Definition 8**.**
Let be some positive integer, and a model of with signature . A grammar is a subset of . The language of is . The class of languages .
The elements of are principal elements of filters, and are called forbidden subfactors.
As an example, let and consider . includes the strings and and no other strings, because the substrings , , and are all forbidden. This language belongs to .
Proposition 3**.**
For each and each , has a zero intersection with .
Proof.
Suppose there exists such that and . This implies that and thus that which contradicts Definition 8. ∎
In other words, the principal ideal of is disjoint from the principal filters of the elements of .
5 Grammatical Entailments
Given a grammar , we call a subfactor in ungrammatical if it belongs to a principal filter of any element of . Subfactors that are not ungrammatical are called grammatical. Lemma 14 ensures that grammaticality is downward entailing, in the sense that if a model of the word is not contained in the principal filters of the elements of the grammar, then neither are the subfactors of . But it also ensures that ungrammaticality is upward entailing: if a model of the word belongs to the principal filters of the elements of the grammar, then all of the superfactors of in that filter are likewise contained.
In this way, the ideals and filters within a a particular model noted above give rise to these entailment properties of grammaticality with respect to the hypothesis space. If the learner constructs filters, then the grammar will allow structures such that language membership is downward entailing with respect to the grammar , and language non-membership is upward entailing with respect to the grammar .
5.1 Example: Text Capitalization
As an example, consider capitalized letters as discussed above. In an unconventional word model, each capital letter at some position is represented as satisfying one of the relations as well as the unary relation . Thus the relation is true of both lowercase and uppercase , but is only true of uppercase . Note also that in this model no position of a structure can satisfy both predicates and . We return to this point in §7.
Figure 2 showcases the relationship among these structures under a model . The structure for , , contains as subfactors , , [], and the empty structure (not shown). The empty structure is a subfactor of [], and [] in turn is a subfactor of and . The subfactor contains the subfactor [], the domain element with no relations, but has superfactors [capital,a], which has one domain element and two relations, and [a][], which has two domain elements, and the first satisfying the property a. Subfactors and superfactors are listed above and below each other, respectively, with lines between them. Members of one ideal are noted with a blue checkmark, and members of a filter are noted by a red asterisk.
Applying this to the example in Figure 3, if the structure is grammatical, then all of its subfactors, such as [capital] and [a], and [] are grammatical. Since those are grammatical, each of their subfactors is also grammatical, which in this case is just [], shown in blue in Figure 3. Conversely, if the structure [a][] is known to be ungrammatical, then any structure which has it as a subfactor is also ungrammatical (in this example, [capital,a][], shown in Red in Figure 3. To see the importance, consider a string with only lowercase letters. In a connected model, the grammar would ban 26 forbidden factors (A,B,C,…), but the ``capital" model bans just one, [capital].
5.2 Example: Long Distance Linguistic Dependencies
As another example, sequences of speech sounds as mentioned earlier may be decomposed into binary features based on their phonetic properties like anteriority (ant — whether it occurs in the anterior of the vocal tract), stridency (str — whether it produces a high-intensity fricative noise), or voicing (voi — whether it activates the vocal chords), among others (Hayes, 2009). Each sound at some position is represented as satisfying relations .
Thus the relation is true of both the sound s as in the first sound of sue" and S, as in shoe", but is only true of S.
Note also that in this model no position of a structure can satisfy both predicates and . We return to this point in §7 below. We again use square brackets to delimit the domain elements and write the unary features within them, so a model representation like has the following visual representation:
+str
+ant
+str
-ant
To ease the exposition, we will use square brackets to delimit the domain elements and write the unary relations within them instead of specifying the model in mathematical detail. In an unconventional subsequence word model, then, one possible structure of the subsequence s…S is written .
In many languages, the presence of certain segments is dependent on the presence of another segment. In Samala, subsequences like s…s are allowed but s…S are not, so words like hasxintilawas are allowed but words like hasxintilawaS are not (Hansson, 2010). In an unconventional model, banning structures of the form [+str][+str] is insufficient, since all these segments share that stridency property, while a structure like will distinguish them, since they disallow stridents which disagree on the ant relations. The structure [+ant][-ant] however, is insufficient, since consonants like p,b,m have that feature, and would incorrectly ban acceptable strings. To see the importance, a conventional string model must ban multiple sibilant factors sS,zS,sZ,zZ, while an unconventional model must just ban one,
Figure 4 showcases the relationship among these structures under a precedence model . The structure for contains as subfactors (among others) , , [], and the empty structure (not shown). The empty structure is a subfactor of [], and [] in turn is a subfactor of and , and so on. If the structure is grammatical, then all of its subfactors, are grammatical, and so are their subfactors, in turn. Conversely, if the structure is known to be ungrammatical, then any structure which has it as a subfactor is also ungrammatical (for example, , where the first segment is also voiced +voi), shown in Red in Figure 4.
The structure filters give the learner an advantage when confronting hypothesis spaces under a particular model. In particular, it allows the learner to prune vast swathes of the hypothesis space as it reaches for principal elements of features. If a learner identifies one structure as being grammatical, the learner may infer that all of its subfactors are also grammatical and not have to consider them. Alternatively, if the learner knows a structure is ungrammatical, it may infer that the ideals above it are also ungrammatical.
Generally, these reductions can be exponential: an alphabet of size can be represented with unary relations in the model signature. However, this exponential reduction does not necessarily make learning any easier. The reason for this is that the size of equals where is the number of unary relations. Since a grammar is defined as a subset of , the number of considered grammars is thus very large. Therefore, the problem of how to search this space effectively is paramount.
6 The Learning Problem
For some , is learnable from positive data? The short answer is Yes (Heinz, 2010; Heinz et al., 2012). The solution presented in these papers can be thought of as using the function to identify permissible -factors in words in the positive data. The -factors that are not permissible are forbidden. With sufficient positive data, such a learning algorithm will converge to a grammar that generates any target language in the class. While this solution is sound in theory, when the space of -factors is very large, it is not practical. Here, we make clear the problem the learning algorithm solves.
We state the learning problem not in terms of converging to a correct grammar in the limit as previously studied, but instead of returning an `adequate' grammar given a finite positive sample. Determining what counts as an adequate grammar is what (De Raedt, 2008) calls a Quality Criterion.
Definition 9** (The Learning Problem).**
Fix , model , and positive integer . For any language and for any finite , return a grammar such that
is consistent, that is, it covers the data: ; 2. 2.
is a smallest language in which covers the data: so for all where , we have ; and 3. 3.
includes structures that are restrictions of structures included in other grammars that also satisfy (1) and (2): for all satisfying the first two criteria for all , there exists such that .
The first criterion is self-explanatory. The second criterion is motivated by Angluin's (1980) analysis of identification in the limit. The third criterion requires that the grammar contain the most ``general'' subfactors. An example will help illustrate this criterion.
Consider again the grammar with . is the same as where . In all the forbidden subfactors are of size 2, whereas encapsulates all of the 2-factors in which include with a single 1-factor . Both grammars and may satisfy criteria (1) and (2) but would not satisfy criterion (3) because of .
7 A Bottom-Up Learning Algorithm
(De Raedt, 2008) identifies two directions of inference: specific-to-general (i.e., top-down') and general-to-specific (i.e., bottom-up'). Since we are trying to find the most general subfactors, top-down inference has the potential to consider exponentially many more subfactors than bottom-up inference.
It makes mores sense to traverse bottom-up, that is, from the most general subfactors possible to the most specific.
Additionally, once a subfactor is identified as an element of the grammar, none of its superfactors (elements of its principal filter) need to be considered further.
A bottom-up learner is shown in Algorithm 1. Its input is a positive data sample and an integer that identifies the upper bound on the size of the subfactors.
The algorithm makes use of a queue , which is initialized to contain just the empty structure . It also initializes two empty sets: , the grammar that will ultimately be returned, and , the set of `visited subfactors'. The subfactors in are considered one at a time, in order, and as each subfactor is considered it is added to . If is not a subfactor of the model of any word in the positive sample (i.e., not contained by any data point in ), then it is added to the grammar .
If is a subfactor of the sample, it is sent to the function , which returns a set of least superfactors for . For concreteness, may be defined formally as follows:
.
Practically will be defined constructively so that each subfactor in is constructed only once as needed. Thus, not only will it not be needed to store the whole set in memory, but the set may be excluded from the algorithm as well.
This set of superfactors is then filtered by the following criteria: they must be smaller than , they must contain no element of as a subfactor, and they must not have been previously considered (i.e., they cannot be in ). Those structures that survive this filter are added to . This procedure continues until there are no more structures left to consider in .
Theorem 1**.**
For any , and any finite set provided as input to Algorithm 1, it returns a grammar satisfying Definition 9.
Proof.
Consider any . Algorithm 1 only adds elements to that are not subfactors of , so . Thus and , satisfying Condition (1).
Consider any with . To show , consider any . Then and since . Then . Hence, , and so , satisfying Condition (2).
For condition (3), we use the fact that elements in the grammar were in Q at some point. Suppose are subfactors such that , , and (. Since , then at some point .
If then will be added to before is generated by . Because is a queue, will also be removed from before is generated by . Since is not contained by any with , it will be added to . When is generated by , it will not pass the filter because it fails the second criterion since and . Then is never added to , and therefore , contra our original assumption. Thus Condition (3) is satisfied. ∎
One aspect of the algorithm to highlight is that when a subfactor is added to , it is not added to . Consequently, is never added to . In this way, finding elements of prunes the remainder of the space to be searched (see figure 5). In general, it is not the case that every element in the principal filter of will not be generated by since some of these elements may belong to for other subfactors on the . We expect subfactors on the `border' of to be generated in this way (and then they are filtered out). This pruning, especially when the subfactors are quite general, can significantly reduce the remaining space to be traversed.
In regard to efficiency, in the worst case, the elements of are all very specific subfactors and are greatest elements of . In this case, every subfactor will be added to and the time complexity is thus exponential. However, we are primarily interested in the case when are a small proportion of . This constitutes an example of data sparsity. In this case, we believe the elements of the target grammar will be much `lower' in the partial order and thus will be found much more quickly. Determining what conditions on and result in a polynomial time run in the size of is a focus of current research activity.
Another area of active research is developing a recipe for the function for models with a successor or precedence order relation and arbitary unary relations. The basic idea underlying the bottom-up algorithm is to develop a spanning tree for the poset and to traverse this tree in a breadth-first manner. The function helps control this search. Ideally, would only generate each subfactor once, which obviates the need to store visited subfactors in . This can be accomplished to some extent in different ways. For incompatible unary relations, like and in our capitalization example, can be defined to prevent adding property to a position that already satisfies property .
For compatible unary relations, like and in our capitalization example, an ordering over the unary relations such as can help eliminate generating the same subfactor in different ways. For example, if is defined to only add `lesser' unary relations to positions that already have them then it would only output [] given the subfactor [] as input. On the other hand, when given as input the subfactor [], it could not add any unary relation to this position.
8 Conclusion
In this paper, we considered the problem of learning formal languages defined as the complement of the union of finitely many principal filters, whose principal elements make up the grammar. This is one way to characterize the Strictly -Local and Strictly -Piecewise languages, but the generalization here lets us consider enriched representations of strings where different elements in a string can be said to share properties. it also lets us learn the shortest forbidden substrings in (Ron et al., 1996) This is useful in many applications where domain-specific knowledge is available and should be taken advantage of. Such enriched representations, however, have a drawback. The number of subfactors is large which makes identifying the principal elements of the filters difficult. This paper showed that the partial ordering of the subfactors motivates a bottom-up learning algorithm which finds the least subfactors whose filters do not include the positive data.
Acknowledgments
We would like to thank James Rogers for very helpful discussion on the notion of subfactor. This work was supported by NIH grant #R01HD87133-01 to JH.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Angluin (1980) Dana Angluin. 1980. Inductive inference of formal languages from positive data. Information Control , 45:117–135.
- 2Büchi (1960) J. Richard Büchi. 1960. Weak second-order arithmetic and finite automata. Mathematical Logic Quarterly , 6(1-6):66–92.
- 3De Raedt (2008) Luc De Raedt. 2008. Logical and Relational Learning . Springer-Verlag Berlin Heidelberg.
- 4Enderton (2001) Herbert B. Enderton. 2001. A Mathematical Introduction to Logic , 2nd edition. Academic Press.
- 5Garcia et al. (1990) Pedro Garcia, Enrique Vidal, and José Oncina. 1990. Learning locally testable languages in the strict sense. In Proceedings of the Workshop on Algorithmic Learning Theory , pages 325–338.
- 6Gorn (1967) Saul Gorn. 1967. Explicit definitions and linguistic dominoes. In Systems and Computer Science , pages 77–115, Toronto. University of Toronto Press.
- 7Hansson (2010) Gunnar Hansson. 2010. Consonant Harmony: Long-Distance Interaction in Phonology . Number 145 in University of California Publications in Linguistics. University of California Press, Berkeley, CA. Available on-line (free) at e Scholarship.org.
- 8Hayes (2009) Bruce Hayes. 2009. Introductory Phonology . Wiley-Blackwell.
