This paper investigates how bounding variable frequency in pattern languages affects the complexity of learning and teaching these patterns, focusing on the minimum number of examples needed for unique identification in different models.
Contribution
It introduces the study of variable frequency bounds in pattern languages and analyzes their impact on teaching complexity in cooperative learning models.
Findings
01
Bounding variable frequency influences the teaching dimension of pattern classes.
02
The paper provides bounds on the number of examples needed for pattern identification.
03
It compares teaching complexity across different models with variable frequency restrictions.
Abstract
Patterns provide a concise, syntactic way of describing a set of strings, but their expressive power comes at a price: a number of fundamental decision problems concerning (erasing) pattern languages, such as the membership problem and inclusion problem, are known to be NP-complete or even undecidable, while the decidability of the equivalence problem is still open; in learning theory, the class of pattern languages is unlearnable in models such as the distribution-free (PAC) framework (if P/poly=NP/poly). Much work on the algorithmic learning of pattern languages has thus focussed on interesting subclasses of patterns for which positive learnability results may be achieved. A natural restriction on a pattern is a bound on its variable frequency -- the maximum number m such that some variable occurs exactly m times in the pattern. This paper examines the…
Tables1
Table 1. Table 1: TD and PBTD of various pattern classes. In each entry, m ≥ 1 𝑚 1 m\geq 1 , the universal (resp. existential) quantifier is taken over all
patterns belonging to the class in the corresponding row and Π Π \Pi refers to the class in the corresponding row.
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Algorithms and Data Compression · semigroups and automata theory
Full text
11institutetext: Department of Mathematics, National University of Singapore
10 Lower Kent Ridge Road, Singapore 119076, Republic of Singapore
The Teaching Complexity of Erasing Pattern Languages With Bounded Variable Frequency
Ziyuan Gao
Abstract
Patterns provide a concise, syntactic way of describing a set of strings, but their
expressive power comes at a price: a number of fundamental decision problems concerning
(erasing) pattern languages, such as the membership problem and inclusion problem, are known to be
NP-complete or even undecidable, while the decidability of the equivalence problem is still open; in
learning theory, the class of pattern languages
is unlearnable in models such as the distribution-free (PAC)
framework (if P/poly=NP/poly). Much work on the algorithmic learning of pattern languages has thus focussed
on interesting subclasses of patterns for which positive learnability results may be achieved.
A natural restriction on a pattern is a bound on its variable frequency – the maximum number m such
that some variable occurs exactly m times in the pattern. This paper examines the effect of
limiting the variable frequency of all patterns belonging to a class Π
on the worst-case minimum number of labelled examples needed to uniquely identify any pattern
of Π in cooperative teaching-learning models.
Two such models, the teaching dimension model as well as the preference-based teaching model,
will be considered.
1 Introduction
In the context of this paper, a pattern is a string made up of symbols from two disjoint sets,
a countable set X of variables and an alphabet Σ of constants. The non-erasing
pattern language generated by a pattern π is the set of all words obtained by substituting
nonempty words over Σ for all the variables in π, under the condition that for any
variable, all of its occurrences in π must be replaced with the same word; the erasing pattern
language generated by π is defined analogously, the only difference being that the variables in
π may be replaced with the empty string.
Unless stated otherwise, all pattern languages in the present paper refer to erasing pattern languages.
In computational learning theory, the non-erasing pattern languages were introduced by Angluin [3] as a motivating example for her work on the identification of uniformly decidable families of languages in the
limit. Shinohara [35] later introduced the class of erasing pattern languages, proving that
the class of all such languages generated by regular patterns (patterns in which every variable occurs at most once)
is polynomial-time learnable in the limit.
Patterns and allied notions - such as that of an extended regular expression [1, 9, 33, 14],
which has more expressive power than a pattern – have also been studied in other fields, including word
combinatorics and pattern matching. For example, the membership problem for pattern languages is closely related to the problem of matching ‘patterns’ with variables (based on various definitions
of ‘pattern’) in the pattern matching community [6, 2, 12, 10, 11].
The present paper considers the problem of uniquely identifying pattern languages from
labelled examples – where a labelled example for a pattern
language L is a pair (w,∗) such that ∗ is “+” if w belongs to L and
“−” otherwise – based on formal teaching-learning models. We shall study two such models in the computational learning theory literature: the well-known teaching dimension (TD) model [19, 34]
and the preference-based teaching (PBT) model [17] (c.f. Section 3). Given a model T and any class
Π of patterns to be learnt, the maximum size of a sample (possibly ∞) needed for a learner to
successfully identify any pattern in Π based on the teaching-learning algorithm of T is
known as the teaching complexity of Π (according to T). The broad question we try to partly address is: what properties
of the patterns in a given class Π of patterns influence the teaching complexity of Π
according to the TD and PBT models? More specifically, let Πm be a class of patterns π such that the maximum number
of times any single variable occurs in π (known here as the variable frequency
of π) is at most m; how does the teaching complexity of Πm vary with
m? The variable frequency of a pattern is quite a natural parameter that has been
investigated in other problems concerning pattern languages. For example,
Matsumoto and Shinohara [27] established an upper bound on the query complexity of
learning (non-erasing) pattern languages in terms of the variable frequency of the
pattern and other parameters; Fernau and Schmid [13] proved that the membership problem for patterns remains
NP-complete even when the variable frequency is restricted to 2 (along with other
parameter restrictions).
In this paper, one motivation for concentrating on the variable frequency of a pattern
rather than, say, the number of distinct variables occurring in the pattern, comes
from examining the teaching complexity of some basic patterns. Take the constant
pattern [math], where [math] is a letter in the alphabet Σ of constants. The language generated
by this pattern cannot be finitely distinguished (i.e., distinguished
using a finite set of labelled examples) from every other pattern
language, even only those generated by a pattern with at most one variable. Indeed, any finite set {(0,+),(w1,−),…,(wk,−)} of labelled examples for the pattern [math] is also
consistent with the pattern 0xm where m=max1≤i≤k∣wi∣.
The latter observation depends crucially on the
fact that a variable may occur any number of times in a pattern, and less so on the number of distinct
variables occurring in a pattern. A similar remark applies to the pattern languages generated by
patterns with a constant part of length at least 2
[7, Theorem 3].
On the other hand, if one were to teach the singleton
language {0} w.r.t. all languages generated by patterns with variable frequency at most k for
some fixed k, then a finite distinguishing set for {0} could consist of (0,+) plus all negative
examples (0n,−) with 2≤n≤k+1. This seems to suggest that the maximum variable
frequency of the patterns in a class of patterns may play a crucial role in determining
whether or not the languages generated by members of this class are finitely distinguishable.
The first section of this work studies the teaching complexity of simple block-regular
patterns, which are equivalent to patterns of the shape x1a1x2a2…an−1xn,
where x1,…,xn are distinct variables and a1,…,an−1 are constants.
They make up one of the simplest, non-trivial classes of patterns that have a restriction on
the variable frequency. Bayeh et al. [7] showed that over alphabets of size
at least 4, the languages generated by such patterns are precisely those
that are finitely distinguishable; we refine this result by determining, over any alphabet, the
TD and PBT dimensions of the class of simple block-regular patterns. Further, we calculate
the TD of these patterns w.r.t. the class of regular patterns and provide
an asymptotic lower bound for the TD of any given simple block-regular pattern w.r.t. the whole class of patterns. In the subsequent section, we proceed to the more general problem of determining, for
various natural classes Π of patterns that have a uniformly bounded variable frequency,
those members of Π that are finitely distinguishable.
It will be proven that all m-quasi-regular patterns (i.e. every variable of the pattern
occurs exactly m times) and m-regular (i.e. every variable
occurs at most m times) non-cross patterns are finitely distinguishable w.r.t. the class of
m-quasi-regular and m-regular non-cross patterns respectively; moreover, the TD of the class of m-regular non-cross patterns is even finite and in fact sublinear
in m. Next, we present partial results on the problem of determining the subclass of
m-regular patterns that have a finite TD. Over any infinite alphabet, everym-regular pattern is finitely distinguishable –
contrasting quite sharply with the previously mentioned theorem that over alphabets with at
least 4 letters, the only patterns with a finite TD are the simple block-regular
ones.
Over binary alphabets, on the other hand, there are patterns that are not finitely distinguishable even when
the variable frequency is restricted to 4.
Due to space constraints, most proofs have been deferred to the appendix.
2 Preliminaries
N0 denotes the set of natural numbers {0,1,2,…} and
N=N0∖{0}.
Let X={x1,x2,x3,…} be an infinite set of variable symbols. An alphabet is a finite or countably infinite set of symbols, disjoint from X. Fix an alphabet Σ. A pattern is a nonempty finite string over
X∪Σ. The class of patterns over any alphabet Σ with z=∣Σ∣
is denoted by Πz; this notation reflects the fact that all the properties of patterns
and classes of patterns considered in the present work depend only on the size of the alphabet
and not on the actual letters of the alphabet. The erasing pattern languageL(π) generated by a pattern π over Σ
consists of all strings generated from π when replacing variables in π with any string over Σ,
where all occurrences of a single variable must be replaced by the same string [35]. Patterns π and τ over Σ are said to be equivalent
iff L(π)=L(τ);
they are similar iff π=α1u1α2u2…unαn
and τ=β1u1β2u2…unβn for some u1,u2,…,un∈Σ+
and α1,…,αn,β1,…,βn∈X∗. Unless specified otherwise, we identify any pattern π belonging
to a class Π of patterns with every other π′∈Π such that L(π)=L(π′).
\mboxVar(π) (resp. \mboxConst(π)) denotes the
set of all distinct variables (resp. constant symbols) occurring in π.
For any symbol a and n∈N0,
an denotes the string equal to n concatenated copies of a. For any alphabets A and B, a morphism
is a function h\mathchar58A∗→B∗ with h(uv)=h(u)h(v)
for all u,v∈A∗. A substitution is a morphism h\mathchar58(Σ∪X)∗→Σ∗
with h(a)=a for all a∈Σ.
By abuse of notation, we will often use the same symbol
h to represent the morphism (X∪Σ)∗↦Σ∗
that coincides with the substitution h on individual variables
and with the identity function on letters from Σ.
Ih,π denotes the mapping of closed intervals of positions of
π to closed intervals of positions of h(π)
induced by h;
π(ε) denotes the word obtained from π by substituting
ε for every variable in π.
Let ⊑ denote the subsequence relation on Σ∗: u⊑v holds iff there are numbers i1<i2<…<i∣u∣
such that vij=uj for all j∈{1,…,∣u∣}.
Given any u,v∈Σ∗, the shuffle product of u and v,
denoted by u\shufflev, is the set {u1v1u2v2…ukvk\mathchar58ui,vi∈Σ∗∧u1u2…uk=u∧v1v2…vk=v}.
Given any A,B⊆Σ∗, the shuffle product of A
and B, denoted by A\shuffleB, is the set ⋃u∈A∧v∈Bu\shufflev.
If A={u} , we will often write A\shuffleB
as u\shuffleB.
3 Teaching Dimension and Preference-based Teaching Dimension
Machine teaching focusses on the problem of designing, for any given learning algorithm, an optimal training
set for every concept belonging to a class of concepts to be learnt [36].
Such a training set is sometimes known as a teaching set.
In this work, an “optimal” teaching set for a pattern π is one that has the minimum number of
examples labelled consistently with π needed for the algorithm to successfully identify π (up to equivalence).
We study the design of optimal teaching sets for various classes of pattern languages w.r.t. (i) the classical teaching dimension model [19, 34],
where it is only assumed that the learner’s hypotheses are always consistent with the given
teaching set; (ii) the preference-based teaching model [17], where the learner has, for
any given concept class, a particular “preference relation” on the class, and the learner’s
hypotheses are always not only consistent with the given teaching set, but also not less
preferred to any other concept in the class w.r.t. the preference relation.
Fix an alphabet Σ.
Let Π be any class of patterns, and suppose π∈Π.
A teaching set for π w.r.t. Π is a set T⊆Σ×{+,−} that is consistent with
π but with no other pattern in Π
(up to equivalence), that is, w∈L(π) for all (w,+)∈T and w∈/L(π) for all (w,−)∈T.
The teaching dimension of π w.r.t. Π, denoted by \mboxTD(π,Π) is defined as
\mboxTD(π,Π)=inf{∣T∣\mathchar58T\mboxisateachingsetforπ\mboxw.r.t.Π}.
Furthermore, if Π′⊆Π, then the teaching dimension of Π′ w.r.t. Π, denoted by \mboxTD(Π′,Π), is defined as \mboxTD(Π′,Π)=sup{\mboxTD(π,Π)\mathchar58π∈Π′}.
The teaching dimension of Π, denoted by \mboxTD(Π), is defined as \mboxTD(Π,Π).
In real-world learning scenarios, even the smallest possible teaching set for a given
concept relative to some concept class may be impractically large.
Learning algorithms often make predictions based on a set of assumptions
known as the inductive bias, which may allow the algorithm
to infer a target concept from a small set of data even when there is more than
one concept in the class that is consistent with the data.
Certain types of bias impose an a priori preference ordering on the learner’s hypothesis
space; for example, an algorithm that adheres to the Minimum Description Length (MDL)
principle favours hypotheses that have shorter descriptions based on some given
description language. The preference-based teaching model, to be defined shortly, considers learning
algorithms with an inductive bias that specifies a preference ordering of the learner’s
hypotheses.
Let ≺ be a strict partial order on Π,
i.e., ≺ is asymmetric and transitive. The partial order that makes every pair π,π′∈Π (where L(π)=L(π′))
incomparable is denoted by ≺∅. For every π∈Π, let
Π≺π={π′∈Π\mathchar58π′≺π}
be the set of patterns over which π is strictly preferred (as mentioned
earlier, equivalent patterns are identified with each other).
A teaching set for π w.r.t. (Π,≺) is defined as
a teaching set for π w.r.t. Π∖Π≺π. Furthermore define
\mbox{PBTD}(\pi,\Pi,\prec)=\inf\{|T|\mathrel{\mathop{\mathchar 58\relax}}T\mbox{ is a teaching set for \piw.r.t.(\Pi,\prec})\}\in{\mathbb{N}}_{0}\cup\{\infty\}.
The number \mboxPBTD(Π,≺)=supπ∈Π\mboxPBTD(π,Π,≺)∈N0∪{∞}
is called the teaching dimension of (Π,≺).
The preference-based teaching dimension of Π is given by
\mbox{PBTD}(\Pi)=\inf\{\mbox{PBTD}(\Pi,\prec)\mathrel{\mathop{\mathchar 58\relax}}\mbox{\precisastrictpartialorderon\Pi}\}.
For all pattern classes Π and Π′ with Π′⊆Π,
K(Π′)≤K(Π) for K∈{\mboxTD,\mboxPBTD} (i.e. the TD and PBTD are monotonic) and
\mboxPBTD(Π)≤\mboxTD(Π) [17].
4 Simple Block-Regular Patterns
Fix an alphabet Σ of size z≤∞.
A pattern π∈Πz is said to be simple block-regular if
it is of the shape X1a1X2a2…an−1Xn, where X1,…,Xn∈X+,
a1,…,an−1∈Σ, and for all i∈{1,…,n},
Xi contains a variable that does not occur in any other variable block Xj with j=i.
Every simple block-regular
pattern is equivalent to a pattern π′ of the shape
y1a1y2a2…akyk+1, where
k≥0, a1,a2,…,ak∈Σ and y1,y2,…,yk+1
are k+1 distinct variables [20, Theorem 6(b)]. \mbox{SR\Pi}^{z} denotes
the class of all simple block-regular patterns in Πz. \mbox{SR\Pi}^{z}
is a subclass of the family of regular patterns (denoted by \mboxRΠz),
which are patterns in which every variable occurs at most once.
As mentioned in the introduction, the simple block-regular patterns constitute precisely
the subclass of finitely distinguishable patterns over any alphabet of size at least 4
[7, Theorem 3]. The language generated by a simple block-regular pattern
is known as a principal shuffle ideal in word combinatorics [25, §6.1],
and the family of all such languages is an important object of study in the PAC learning model [5].
The goal of this section is to determine the teaching complexity of the class of simple
block-regular patterns over any alphabet Σ w.r.t. three classes: \mbox{SR\Pi}^{|\Sigma|} itself,
\mboxRΠ∣Σ∣ and Π∣Σ∣.
It will be shown that \mbox{TD}(\mbox{SR\Pi}^{|\Sigma|})<\mbox{TD}(\mbox{SR\Pi}^{|\Sigma|},\mbox{R}\Pi^{|\Sigma|})<\mbox{TD}(\mbox{SR\Pi}^{|\Sigma|},\Pi^{|\Sigma|}).
To this end, we introduce a uniform construction of a certain negative example
for any given pattern π; as will be seen shortly, this example is powerful enough to
distinguish π from every simple block-regular pattern whose constant
part is a proper subsequence (not necessarily contiguous) of the constant part of π.
Notation 1
For any word w=δ1m1δ2m2…δkmk, where δ1,…,δk∈Σ
and δi=δi+1 whenever 1≤i<k,
m1,…,mk≥1 and k≥1, define
[TABLE]
(In particular, if m≥1, then δ1m=δ1m−1.)
Lemma 2
Fix any z∈N∪{∞} and any \pi,\tau\in\mbox{SR\Pi}^{z} with
π(ε)=ε.
Then π(ε)∈/L(π).
Furthermore, if τ(ε)⊏π(ε), then
π(ε)∈L(τ).
Proof. Suppose π(ε)=δ1m1δ2m2…δkmk,
where δ1,…,δk∈Σ and δi=δi+1
whenever 1≤i<k, m1,…,mk≥1 and k≥1.
That π(ε)∈/L(π) may be argued as follows: if
k=1, then π(ε)=δ1m1−1⊏π(ε) is immediate;
if k≥2, then one shows by induction that for i=1,…,k−1, δ1m1δ2m2…δimiδi+1⊑δ1m1−1δ2m2δ1δ2m2−1δ3m3δ2…δimi−1δi+1mi+1δi.
For the second part of the lemma, suppose τ(ε)=δ1n1δ2n2…δknk,
where 0≤ni≤mi for all i∈{1,…,k} and
ni0≤mi0−1 for some least number i0.
Taking w=π(ε) in Equation (1),
observe that δini⊑δimi−1δi+1mi+1δi for all i<i0,
δi0ni0⊑δi0mi0−1,
and δjnj⊑δjmjδj−1δjmj−1
for all j>i0. Thus, since τ is simple block-regular, one has that
π(ε)∈L(τ).
Lemma 2 now provides a tool for establishing the
TD of \mbox{SR\Pi}^{z}.
Theorem 3
For any z∈N∪{∞}, \mbox{TD}(\mbox{SR\Pi}^{z})=2 and \mbox{PBTD}(\mbox{SR\Pi}^{z})=1.
Proof. Fix any 0∈Σ. The pattern π\mathchar58=x10x2 needs to be taught with
at least one negative example in order to distinguish it from x1.
Suppose a teaching set for π contains (w1w2…wk,−), where w1,…,wk∈Σ. For any m≥3, w1w2…wk∈/L(π′), where π′\mathchar58=x1w1x2w2x3…xkwkxk+10xk+20…0xk+m. Since π′ is simple block-regular and L(π′)=L(π),
at least one additional example is required to distinguish π from π′. Hence \mbox{TD}(\mbox{SR\Pi}^{z})\geq 2.
Let π be any simple block-regular pattern. Since x1 can be taught
with the single example (ε,+), we will suppose that
π(ε)=ε.
A teaching set for π consists of the two examples (π(ε),+)
and (π(ε),−). By Lemma 2, (π(ε),−)
is consistent with π and (π(ε),−) distinguishes π
from all patterns π′ such that π′(ε)⊏π(ε),
while (π(ε),+) distinguishes π from all patterns
π′′ such that π′′(ε)⊑π(ε).
Let ≺ be a preference relation on
\mbox{SR\Pi}^{z} such that for any \pi,\tau\in\mbox{SR\Pi}^{z} with L(π)=L(τ),
π≺τ iff ∣π(ε)∣<∣τ(ε)∣.
Every \pi\in\mbox{SR\Pi}^{z} can be taught w.r.t. (\mbox{SR\Pi}^{z},\prec)
using the example (π(ε),+): for every \tau\in\mbox{SR\Pi}^{z} such that
L(τ)=L(π) and π(ε)∈L(τ), τ(ε)⊏π(ε);
thus ∣τ(ε)∣<∣π(ε)∣ and so π≻τ.
Not surprisingly, the TD of a simple block-regular
pattern is in general larger w.r.t. the whole class
of regular patterns than w.r.t. the restricted class of
simple block-regular patterns. It might be worth noting that a smallest teaching set for a
simple block-regular pattern π need not necessarily contain π(ε) as a positive example, as the proof of the following result (c.f. Appendices C
and E) shows.
Theorem 4
\mbox{TD}(\mbox{SR\Pi}^{z},\mbox{R}\Pi^{z})=3.
To prove the lower bound in Theorem 4, it suffices to observe that any teaching set
(w.r.t. the whole class of regular patterns) for a non-constant
regular pattern not equivalent to x1 must contain at least two
positive examples and one negative example; for a very
similar proof, see [7, Theorem 12.1]. We prove the upper bound.
If z=1, then \mboxRΠz is the union of \mbox{SR\Pi}^{z} and all constant patterns (up to
equivalence). By the proof of Theorem 3, any
\pi\in\mbox{SR\Pi}^{z} can be distinguished from every non-equivalent \tau\in\mbox{SR\Pi}^{z}
with one positive example or one positive and one negative example; to distinguish π
from any constant pattern, at most one additional positive example is needed.
Suppose z≥2.
The proof will be split into the cases (i) ∣Σ∣=2
and (ii) ∣Σ∣≥3.
Lemma 5
If \pi\in\mbox{SR\Pi}^{2}, then \mboxTD(π,\mboxRΠ2)≤3.
The basic proof idea of Lemma 5 – using
positive examples to exclude certain types of constant segments of the target pattern –
can also be generalised to the case ∣Σ∣≥3, although the details of the
construction are more tedious.
Lemma 6
Suppose z=∣Σ∣≥3.
If \pi\in\mbox{SR\Pi}^{z}, then \mboxTD(π,RΠz)≤3.
The next result determines upper (for ∣Σ∣∈{1,∞}) and lower (for ∣Σ∣∈N∪{∞}) bounds
for the TD of any given simple block-regular
pattern w.r.t. the whole class of patterns.
It turns out that these bounds vary with the alphabet size.
Theorem 7
Suppose z∈N∪{∞} and π=x1c1x2…cn−1xn
for some c1,…,cn−1∈Σ and n≥2.
(i) If z∈{1,∞}, then \mboxTD(π,Πz)∈{1,3}.
(ii) If 2≤z<∞, then \mboxTD(π,Πz)=Ω(∣π∣).
We do not know whether the lower bound given in Assertion (ii) of Theorem
7 is also an upper bound (up to numerical
constant factors). In the proof of [7, Proposition 4],
it was shown that the TD of every simple block-regular pattern π is O(2∣π∣).
5 Finite Distinguishability of m-Quasi-Regular, Non-Cross m-Regular and m-Regular Patterns
This section studies the problem of determining the subclass
of finitely distinguishable patterns w.r.t. three classes: the m-quasi-regular patterns,
the non-cross m-regular patterns, and the m-regular patterns. The first two classes are
interesting from an algorithmic learning perspective as they provide natural examples
of pattern language families that are learnable in the limit111Roughly speaking, a class of languages
is learnable in the limit if there is a learning algorithm such that, given any infinite sequence of all positive
examples for any language L in the class,
the algorithm outputs a corresponding sequence of guesses for the target language (based on a representation
system for the languages in the class) that converges to a fixed representation for L;
this model is due to Gold [16]. [28, 31].
The m-regular patterns are a fairly natural generalisation
of the m-quasi-regular patterns; as will be seen later, the class of constant-free
4-regular patterns is not identifiable in the limit over binary alphabets, and
in particular, not all m-regular patterns are finitely distinguishable over
binary alphabets.
Notation 8
Fix any ℓ≥0 and z,m≥1. An ℓ-variable pattern is one that has at most ℓ distinct variables.
Let Πℓ,mz denote the class of ℓ-variable patterns π such that every
variable occurs at most m times in π; if ℓ=∞,
then there is no uniform upper bound on the number of distinct variables occurring
in any π∈Πℓ,mz; if m=∞, then there is no uniform upper
bound on the number of times any variable can occur. We call every π∈Π∞,mz an m-regular pattern.
Π∞,m,cfz denotes the class of all constant-free m-regular patterns.
Let \mbox{QR\Pi}^{z}_{\ell,m} denote the class of all ℓ-variable patterns π such
that every variable of π occurs exactly m times; again, if ℓ=∞, then there
is no uniform upper bound on the number of distinct variables occurring in
any \pi\in\mbox{QR\Pi}^{z}_{\ell,m}. Every \pi\in\mbox{QR\Pi}^{z}_{\infty,m} is known as an
m-quasi-regular pattern [28]. We denote the class of constant-free m-quasi-regular patterns
by \mbox{QR\Pi}^{z}_{\infty,m,cf}.
Mitchell [28] showed that for any m≥1, the class of m-quasi-regular
pattern languages is learnable in the limit. The next theorem shows that for all
z≥1, every m-quasi-regular pattern even has a finite teaching
set w.r.t. \mbox{QR\Pi}^{z}_{\infty,m}. Thus, at least as far as m-quasi-regular
patterns are concerned, version space learning with a helpful teacher is just as powerful
as learning in the limit. We begin with a lemma, which states that for any given
m-quasi-regular pattern π and every m-quasi-regular pattern τ with L(τ)⊆L(π),
there is some S⊆\mboxVar(τ) of size at most linear in ∣\mboxVar(π)∣ for which L\left(\tau{\big{|}}_{\Sigma\cup S}\right)\not\subseteq L(\pi);
for any S′⊆X∪Σ, \tau{\big{|}}_{S^{\prime}} is the subsequence of τ
obtained by deleting symbols not in S′.
Lemma 9
Fix Σ with z=∣Σ∣≥2 and {0,1}⊆Σ.
Suppose m≥1 and \pi,\tau\in\mbox{QR\Pi}^{z}_{\infty,m}.
If τ(ε)=π(ε) and L(τ)⊆L(π),
then there is some S⊆\mboxVar(τ) with ∣S∣≤1+(∣π(ε)∣+m+4)⋅∣\mboxVar(π)∣
such that L\left(\tau{\big{|}}_{\Sigma\cup S}\right)\not\subseteq L(\pi).
Theorem 10
If z=1, then \mbox{TD}(\mbox{QR\Pi}^{z}_{\infty,m})=3. If
z≥2, then for every \pi\in\mbox{QR\Pi}^{z}_{\infty,m}, \mbox{TD}(\pi,\mbox{QR\Pi}^{z}_{\infty,m})=O(2^{|\pi(\varepsilon)|}+D\cdot(|\pi(\varepsilon)|+D\cdot m)^{D\cdot m}),
where D\mathchar58=max({(1/m)⋅(2⋅∣π∣−∣π(ε)∣),1+(∣π(ε)∣+m+4)⋅∣\mboxVar(π)∣}).
Next, we show that the PBTD of the class of constant-free m-quasi-regular pattern languages
is exactly 1 for large enough alphabet sizes. We establish this value by observing that
if the adjacency graph of a constant-free m-quasi-regular pattern π [26, Chapter 3] has a
colouring satisfying certain conditions, where each colour corresponds to a letter in the alphabet,
then such a colouring can be used to construct a positive example for π that
distinguishes it from all shorter constant-free m-quasi-regular patterns.
Theorem 11
For any z≥1, \mbox{TD}(\mbox{QR\Pi}^{z}_{\infty,1,cf})=\mbox{PBTD}(\mbox{QR\Pi}^{z}_{\infty,1,cf})=0.
Suppose m≥2. If z=∣Σ∣≥4m2+1, then \mbox{PBTD}(\mbox{QR\Pi}^{z}_{\infty,m,cf})=1.
While the PBTD of the class of m-quasi-regular patterns remains open in full generality, we observe
that over unary alphabets, the PBTD of this class is exactly 2 for any m≥1.
Proposition 12
For any m≥1, \mbox{PBTD}(\mbox{QR\Pi}^{1}_{\infty,m})=2.
If z≥2, then \mbox{PBTD}(\mbox{QR\Pi}^{z}_{\infty,m})\geq 2.
A non-cross patternπ is a constant-free pattern of the shape
x0n0x1n1…xknk, where n0,n1,…,nk∈N.
Let \mboxNCΠ∞,mz denote the class of all non-cross patterns π over any Σ
with ∣Σ∣=z such that every variable of π occurs at most m times.
\mboxNCΠ∞,∞z coincides with \mboxNCΠz, the class of
all non-cross patterns.
The next main result shows that for any fixed m, the TD of every pattern in \mboxNCΠ∞,mz
is not only finite, but also has a uniform upper bound depending only on m.
Slightly more interestingly, the teaching complexity of \mboxNCΠ∞,mz in the preference-based
teaching model varies with the alphabet size when m≥2: over unary alphabets, the PBTD of this class is
exactly linear in m, while over alphabets of size at least 2, the PBTD is exactly 1.
In the following lemma, we observe certain properties of an “unambiguous” word that
was constructed in [31, Lemma 13].
Lemma 13
(Based on [31, Lemma 13])
Suppose {0,1}⊆Σ. Fix any m≥2, and let π=x0n0…xknk,
where n0,…,nk∈{2,…,m}. Suppose there are positive numbers ℓ and i1,…,iℓ such that
[TABLE]
where, for each j∈{1,…,ℓ}, Ij is the closed interval of positions of w occupied
by the subword (0j1)ij as indicated with braces in Equation (2).
For each j∈{0,…,k}, let Jj denote the closed interval of positions of π occupied by xjnj.
Let h be any substitution such that h(π)=w and h(xi)=ε for all i∈{0,…,k}.
Then the following hold.
(i)
For all j∈{0,…,k}, h(xj) is of the shape (0j′1)i′ for some
j′∈{1,…,ℓ} and i′∈{1,…,ij′}.
2. (ii)
For each j∈{1,…,ℓ}, there are gj∈{0,…,k} and hj∈{0,…,k−gj} such that Ij=∐l=0hjIh,π(Jgj+l).
Theorem 14
For all z∈N∪{∞}, \mboxTD(\mboxNCΠ∞,1z)=\mboxPBTD(\mboxNCΠ∞,1z)=0.
Suppose m≥2.
(i)
If z=1, then \mboxTD(\mboxNCΠ∞,mz)=Θ(m) and
\mboxPBTD(\mboxNCΠ∞,mz)=Θ(m).
2. (ii)
For any n∈N0, let ω(n) denote the number of distinct prime factors of n
and let Π(n) denote the number of prime powers not exceeding n.
If z≥2, then max({ω(n)\mathchar58n≤m})≤\mboxTD(\mboxNCΠ∞,mz)≤2+Π(m−1) and \mboxPBTD(\mboxNCΠ∞,mz)=\mboxPBTD(\mboxNCΠz)=1. In particular, max({ω(n)\mathchar58n≤m})≤\mboxTD(\mboxNCΠ∞,mz)<O((m−1)21log(m−1))+log(m−1)1.25506(m−1).
It is possible that neither the lower bound nor the upper bound on \mboxTD(\mboxNCΠ∞,mz) given in
Theorem 14 is tight for almost all m.
The proof of Theorem 14 (c.f. Appendix N) shows that
the TD of any general
non-cross pattern π w.r.t. \mboxNCΠ∞,mz (for any fixed z≥2 and
m≥2) is at most 2 plus the number of maximal proper prime factors of the variable frequencies
of π, but as the following example shows, this upper bound is not always sharp even for non-cross
succinct patterns with three variables; a pattern π is succinct [28, 32] iff there is no
pattern τ such that L(τ)=L(π) and ∣τ∣<∣π∣.
Example 15
Suppose {0,1}⊆Σ. Let π=x14x28x39.
There are 3 maximal proper prime power factors of 4,8 and 9, namely,
2,4 and 3, and so by the proof of Theorem 14,
the TD of π w.r.t. \mboxNCΠ∞,9∣Σ∣ is at most
2+3=5. However, π has a teaching set of size 4 (further details are given in
Appendix O).
The next result exemplifies the general observation that a larger alphabet allows pattern languages to be distinguished
using a relatively smaller number of labelled examples.
Theorem 16
\mboxPBTD(Π∞)=2* and for any m≥1, \mboxPBTD(Π∞,m1)=Θ(m).*
The next series of results deal with the finite distinguishability problem
for the general class of m-regular patterns. We begin with a few preparatory results.
The first part of Theorem 17
gives a sufficient criterion for the inclusion of pattern languages, and it was observed
by Jiang, Kinber, Salomaa and Yu [22]; the second part, due to
Ohlebusch and Ukkonen [30], states that the existence of a constant-preserving
morphism from π to τ (where π and τ are similar) also implies
L(τ)⊆L(π) if Σ contains at least two letters that do not
occur in π or τ. The second result is based on a few lemmas due to
Reidenbach [32, Lemmas 4–6], adapted to the
case of general patterns over an infinite alphabet.
Theorem 17
[22, 30]**
Let Σ be an alphabet, and let π,τ∈Π∣Σ∣.
Then L(π)⊆L(τ) if there exists a constant-preserving
morphism g\mathchar58(X∪Σ)∗↦(X∪Σ)∗ with g(τ)=π.
If ∣Σ∣≥∣\mboxConst(π)∣+2,∣Σ∣≥∣\mboxConst(τ)∣+2 and
π is similar to τ, then L(π)⊆L(τ) only if there exists a
constant-preserving morphism g\mathchar58(X∪Σ)∗↦(X∪Σ)∗ with g(τ)=π.
Lemma 18
(Based on [32])
Suppose ∣Σ∣=∞. Fix any π∈Π∞ such that
π is succinct. Let Y={y1,y2,…} be an infinite set
of variables such that Y∩\mboxVar(π)=∅. Suppose τ∈π\shuffleY∗. Then L(τ)=L(π) iff
(i)
*For all Y′∈Y+ and δ,δ′∈\mboxConst(π), the following hold:
(a) Y′δ is not a prefix of τ, (b) δY′ is not a suffix of τ,
(c) δY′δ′ is not a substring of τ;
*
2. (ii)
There is a constant-preserving morphism g\mathchar58(X∪Σ)∗↦(X∪Σ)∗
such that g(π)=τ;
3. (iii)
For all constant-preserving morphisms h\mathchar58(X∪Σ)∗↦(X∪Σ)∗
with h(π)=τ and for all x∈\mboxVar(π), if there exist Y1,Y2∈Y∗ such that Y1xY2 is a substring of τ and Y1 (resp. Y2)
is not immediately preceded (resp. succeeded) by any y∈Y w.r.t. τ,
then there are splittings Y11Y12 and Y21Y22 of Y1
and Y2 respectively for which h(x)=Y12xY21.
The next crucial lemma shows that for any fixed m≥1, only finitely
many negative examples are needed to distinguish a succinct pattern
π from all patterns π′∈Π∞,m∞ obtained by
shuffling π with an infinite set Y of variables such that Y and
\mboxVar(π) are disjoint.
Lemma 19
Fix Σ with ∣Σ∣=∞.
Suppose k≥0, m≥1 and π∈Πk,m∞.
Let Y={y1,y2,…} be an infinite set of variables such that
Y∩\mboxVar(π)=∅. Suppose τ∈(π\shuffleY∗)∩Π∞,m∞. There is some
τ′∈Π4mk+∣π∣+2,m∞ such that \tau^{\prime}=\tau{\big{|}}_{\Sigma\cup\mbox{Var}(\pi)\cup S} for some finite S⊂Y,
and if L(π)⊂L(τ), then L(π)⊂L(τ′).
Theorem 20
Suppose m≥1.
(i)
\mboxTD(Π∞,m1)≤2m+m+1*
and for all π∈Πk,m∞ with k≥1, \mboxTD(π,Π∞,m∞)=O((D+1)D),
where D\mathchar58=(4mk+∣π∣+2)⋅m.
*
2. (ii)
Let 1Πmz denote the class of patterns π over any alphabet of size z such
that π contains at most one variable that occurs more than m times. Suppose
π∈1Πmz.
If z≥4, then \mboxTD(π,1Πmz)<∞ only if π contains a variable that occurs more than
m times or \pi\in\mbox{SR\Pi}^{z}. If z=∞, then \mboxTD(π,1Πmz)<∞ if π contains a variable that occurs more than
m times or \pi\in\mbox{SR\Pi}^{z}.
The next result shows that over binary alphabets, even the class
of constant-free 4-regular pattern languages contains patterns with infinite TD. We prove this by modifying Reidenbach’s [31] proof of the non-learnability
of x12x22x32 so that every pattern constructed in the proof has variable frequency
at most 4.
Theorem 21
(Based on [31, Theorem 5])
Suppose π=x12x22x32. For any m≥4, \mboxTD(π,Π∞,m,cf2)=∞.
Remark 22
The lower bound 4 on m in Theorem 21 is tight
in the sense that the TD of π\mathchar58=x12x22x32
w.r.t. Π∞,32 is finite. In fact, T\mathchar58={(ε,+),(021202,+),(0,−),(0120,−),(03,−),((01)2(021)2(031)2(041)2,−)}
is a teaching set for π w.r.t. Π∞,32 (further details are
given in Appendix V).
6 Conclusion
Table 1 summarises some of the main results of this paper.
For three types of pattern classes studied – the simple block-regular,
m-quasi-regular and m-regular non-cross patterns – it was found
that over any alphabet size, every pattern in the class is finitely distinguishable;
in the case of simple block-regular and m-regular non-cross patterns, one
also has an upper bound on the TD of the class of such patterns that is,
depending on the alphabet size, constant, linear or sublinear in m.
The most delicate questions appear to be those concerning the m-regular patterns for finite alphabets
of size at least 2; we only know that for all m≥4, there are patterns in
Π∞,m,cf2 that are not finitely distinguishable (and even not learnable in the limit).
We note that the class of non-cross patterns over any
alphabet and the class of all patterns over infinite alphabets are
learnable in the limit222This implies that for every pattern π belonging to any one of these classes,
L(π) contains a finite set that distinguishes π from all π′ in the class such that
L(π′)⊂L(π) [4, Theorem 1]. [31, 28], but they have relatively restricted subclasses
of finitely distinguishable patterns [7, Theorems 3,10]. Thus the
fact that every pattern in the m-regular versions of these classes has a finite
TD suggests that the variable frequency of a pattern class may play
a role in determining whether any given pattern π can be finitely distinguished
from all π′ such that L(π′)⊆L(π). On the other hand, we have seen in Theorem 20(ii) that
even constant patterns cannot be finitely distinguished w.r.t. the class of patterns with at
most one variable (but no uniform upper bound on the number of variable occurrences).
It might be interesting to know whether there is a ‘natural’ class Π of m-regular patterns such that Π is
learnable in the limit but \mboxTD(π,Π)=∞ for some π∈Π.
We also suspect that \mboxTD(Π∞,m∞)=∞ for some m≥2 and \mbox{TD}(\mbox{QR\Pi}^{z}_{\infty,m})=\infty
for some finite z≥2 and m≥1, but as yet do not know how to prove this.
Acknowledgements. The author
was supported (as RF) by the Singapore Ministry
of Education Academic Research Fund grant MOE2016-T2-1-019 / R146-000-234-112.
I sincerely thank Fahimeh Bayeh, Sanjay Jain and
Sandra Zilles for proofreading the manuscript; their numerous suggestions for corrections
and improvements are gratefully acknowledged. I also thank Fahimeh Bayeh very much for her
suggestion to look at the PBTD of m-quasi-regular patterns over unary
alphabets.
Appendix
This appendix contains the proofs not presented in the main part of the paper
as well as additional definitions/notation and examples.
A Additional Definitions and Notation
In this section, we introduce additional definitions and notation needed
for the proofs in the appendix.
Given any x∈X, let \mboxN(x,π) denote the set of all s∈X∪Σ
such that s is adjacent to an occurrence of x in π; call \mboxN(x,π)
the neighbourhood of x in π.
For each δ∈Σ and w∈(X∪Σ)∗,
#(δ)[w] denotes the number of occurrences of δ in w.
If ∣Σ∣=2 and
δ∈Σ, then δ denotes the unique element
of Σ∖{δ}.
For any π∈(X∪Σ)+ and
variables xi1,…,xin occurring in π, let
π[xi1→α1,…,xin→αn] denote
the word obtained from π by substituting αj for xij
whenever j∈{1,…,n} and substituting ε for every other
variable. We will often assume that a pattern π∈Πz is normalised in the sense
that the k variables occurring in π are
named x1,…,xk in order of their first occurrences from left to right (or x if k=1).
Given any pattern π and substitution h\mathchar58X↦Σ∗, h induces a mapping of closed intervals of positions of
π to closed intervals of positions of h(π). This mapping will
be denoted by Ih,π. For any position p of π,
Ih,π({p}) will simply be written as Ih,π(p). We
define the inverse of Ih,π, denoted Ih,π,
to be the mapping of closed intervals of positions of h(π) to closed
intervals of positions of π such that for all closed intervals J⊆{1,…,∣h(π)∣}, Ih,π(J) is the smallest closed
interval I⊆{1,…,∣π∣} such that J⊆Ih,π(I) (in other words, J⊆Ih,π(I) and for all
I′⊂I, J⊆Ih,π(I′)). For any position q of
h(π), Ih,π({q}) will be abbreviated to
Ih,π(q).
Fix any z=∣Σ∣≥1 and π∈Πz.
Suppose that γ∈L(π) for some γ∈Σ∗,
as witnessed by the substitution h\mathchar58X↦Σ∗.
We define a cut of γrelative to (h,π) to be
any pair (I1,I2) of disjoint nonempty closed intervals of positions
of γ such that I1=[r1,r2] and I2=[r2+1,r3] for some
r1,r2,r3∈{1,…,∣γ∣}, and there exists
q∈{1,…,∣π∣} with Ih,π(q)=I1 and
Ih,π(q+1)=I2. If (I1,I2) is a cut of γ
relative to (h,π), then the right endpoint of I1 (which is one less
than the left endpoint of I2) will be called a cut-point of γrelative to (h,π). If the choice of (h,π) is clear
from the context, then (I1,I2) (resp. the right endpoint of I1) will
simply be called a cut of γ (resp. cut-point of γ).
Example A.1
[8]**
Let π=x1x2x1x2x1 and γ=0111011101.
Then h\mathchar58X↦Σ∗, defined by h(x1)=01 and h(x2)=11,
witnesses γ∈L(π). One has that
[TABLE]
and (I1,I2) (where the positions of γ occupied by I1 and I2 are
illustrated in Equation (3)) is a cut of γ relative
to (h,π); the corresponding cut-point of γ relative to (h,π) is 2.
The following basic lemma elucidates the connection between the number
of cuts of h(π) and the length of π. It will be useful
in subsequent results for showing that L(π) cannot contain certain
words.
Lemma A.2
[8]**
If γ has d distinct cuts relative to (h,π), then ∣π∣≥d+1.
Proof. Given any two consecutive cuts (I1,I2) and (J1,J2) of γ such that
the left endpoint of I1 is smaller than the left endpoint of J1,
I1=I2 and I2=J2 together imply that I2=J2. Hence I1,
I2 and J2 correspond to three different positions of π.
B Example of the Mappings I and I
Example B.1
[8]**
Suppose Σ={a,b} and π=x1x2x1x2x1.
Let h\mathchar58X↦Σ∗ be the substitution defined by h(x1)=ab
and h(x2)=bb.
Then γ\mathchar58=h(π)∈L(π) and one has that Ih,π([1,2])=[1,4], Ih,π([4,5])=[7,10] and Ih,π(5)={3}.
Proof. Suppose π=x1δ1x2δ2…δn−1xn, where δ1,δ2,…,δn−1∈Σ. We build a teaching set T for π w.r.t. RΠ2. Let
τ denote any regular pattern that is consistent with T.
Let w1 be the word obtained from π as follows: first, substitute δ1
for x1 and substitute δn−1 for xn; second, for every
substring of π of the shape δxiδ, where δ∈Σ,
replace xi with δ; all other variables are replaced with ε.
Next, let w2 be the word obtained from π such that for every substring
of π of the shape δxiδ, where δ∈Σ,
xi is replaced with δ; all other variables are replaced with ε.
Let φ1 (resp. φ2) be the corresponding substitution witnessing
w2∈L(τ) (resp. w2∈L(π)).
Put (w1,+) and (w2,+) into T. Since, for every δ∈Σ,
w1 does not contain the subword δδ while w2 does not contain
the subword δδδ, and w1,w2 both start and end
with different letters, τ must be of the shape x1A1x2A2…Akxk+1,
where k≤n−1 and for all i∈{1,…,k}, Ai∈{0,1,01,10}.
Thus one may assume, without loss of generality, that τ is a simple block-regular
pattern.
For each position p of w2 such that τ[Iφ1,τ(p)]∈Σ but
π[Iφ2,π(p)]∈X, note that p≥2 and w2[p−1] must be equal to
w2[p], and since τ does not
contain a substring of the shape δδ for any δ∈Σ
(as observed earlier), it follows that τ[Iφ1,τ(p−1)]∈X.
Consequently, τ(ε)⊑π(ε). One may then conclude
from Lemma 2 that adding (π(ε),−) to
T ensures τ(ε)=π(ε). As τ is simple block-regular,
we have that L(τ)=L(π), as required.
We illustrate the construction of the teaching set in the proof of Lemma
5 with the following example.
Example D.1
Suppose Σ={0,1}. Let π=x10x20x31x41x5.
According to the construction in the proof of Lemma 5,
π has the teaching set {(w1,+),(w2,+),(w3,−)} w.r.t. RΠ2, where
w1,w2 and w3 are defined as follows:
(θ1 and θ2 are substitutions witnessing w1∈L(π) and
w2∈L(π) respectively):
Proof. Suppose Σ={a1,a2,…,ak}, where k≥3,
and π=x1ai1x2ai2…ain−1xn, where
x1,x2,…,xn∈X. If n=2, then one may verify
directly that for any b∈Σ∖{ai1},
{(ai1,+),(bai1b,+),(ε,−)} is a teaching set for
π w.r.t. RΠz. We assume in what follows that
n≥3. Again, T={(w1,+),(w2,+),(w3,−)}
will denote a teaching set for π w.r.t. RΠz, where
w1,w2 and w3 are defined below. Further, τ will denote a
regular pattern that is consistent with T.
w1:
For every substring of π of the shape
aijxj+1aij+1, define φ(xj+1)
according to the following case distinction.
Case i:
ij and ij+1 have opposite parities.
Set φ(xj+1)=ε.
Case ii:
ij and ij+1 have equal parities.
Fix some j′∈{1,…,k} such that j′ and ij
have opposite parities (which implies that j′ and ij+1
also have opposite parities), and set φ(xj+1)=aj′.
For all other variables x occurring in π, set φ(x)=ε.
Set w1=φ(π).
w2:
For every substring of π of the shape
aijxj+1aij+1, define ψ(xj+1)
according to the following case distinction.
Case i:
ij and ij+1 have equal parities.
Set ψ(xj+1)=ε.
Case ii:
ij is even and ij+1 is odd.
Case ii.1:
j>1 and ij−1 is even.
Pick any odd j′∈{1,…,k}
such that aj′=aj+1, and set ψ(xj+1)=aj′.
Case ii.2:
j>1 and ij−1 is odd, or j=1.
Pick any even j′∈{1,…,k} and pick any odd j′′∈{1,…,k} such that aj′′=aij+1, and set ψ(xj+1)=aj′aj′′.
Case iii:
ij is odd and ij+1 is even.
Pick any odd j′∈{1,…,k} such that aj′=aij, and set ψ(xj+1)=aj′.
Furthermore, pick j1,j2∈{1,…,k} such that
aj1∈/{ai1,ai2} and aj2∈/{ain−1,ain−2};
set ψ(x1)=aj1 and ψ(xn)=aj2.333Such j1 and j2 must exist since ∣Σ∣≥3.
For all other variables x occurring in π, set ψ(x)=ε.
Set w2=ψ(π).
w3:
Arguing as in the proof of Lemma 5,
the consistency of τ with (w1,+) and (w2,+) implies that τ
is of the shape x1A1x2A2…Ak−1xk, where every maximal
constant block Ai has length at most 2; furthermore, if Ai=aℓaℓ′,
then ℓ and ℓ′ have opposite parities.
Note that Lemma 2 cannot be directly applied
here since the consistency of τ with (w1,+) and (w2,+)
does not imply that τ is simple block-regular.
We will, however, give a different construction of w3 by analysing
a decomposition of w2 containing subwords β1,β2,…,βn−2
such that any maximal constant block of τ is a subword
of some βj (details are to follow).
For each j∈{1,…,n−2}, define βj\mathchar58=aijψ(xj+1)aij+1.
The positions of β1,…,βn−2 are illustrated below.
[TABLE]
Corresponding to each βj, where j∈{1,…,n−2}, we define
a word αj based on the following case distinction.
Case i:
βj=aijaij+1, where ij and ij+1
have equal parities.
Case i.1:
ij and ij+1 are even.
Case i.1.1:
j−1≥1 and ij−1 is odd, j+2≤n−1
and ij+2 is odd.
Then ψ(xj)=aj′ for some odd j′ such that aj′=aij−1
and ψ(xj+2)=aj′′ for some odd j′′ such that aj′′=aij+2.
Set
[TABLE]
Case i.1.2:
j−1≥1 and ij−1 is odd; either
j+2≤n−1 and ij+2 is even, or j+2>n−1.
Then ψ(xj)=aj′ for some odd j′ such that aj′=aij−1.
If j+2≤n−1 and ij+2 is even, define αj as in Case i.1.1 but
with all occurrences of aj′′ deleted.
If j+2>n−1, define αj as in Case i.1.1 but with all occurrences of
aj′′ replaced with ψ(xn) and ψ(xn) appended to αj.
Case i.1.3:
j+2≤n−1 and ij+2 is odd; either
j−1≥1 and ij−1 is even, or j−1<1. Then
ψ(xj+2)=aj′′ for some odd j′′ such that aj′′=aij+2.
If j−1≥1 and ij−1 is even, define αj as in Case i.1.1 but
with all occurrences of aj′ deleted. If j−1<1,
define αj as in Case i.1.1 but with all occurrences of aj′
replaced with ψ(x1) and ψ(x1) prepended to αj.
Case i.1.4:
j−1≥1 and ij−1 is even, or j−1<1;
j+2≤n−1 and ij+2 is even, or j+2>n−1.
If j−1≥1,j+2≤n−1 and both ij−1,ij+2 are even, set
[TABLE]
If j−1<1, set
[TABLE]
If j+2>n−1, set
[TABLE]
Case i.2:
ij and ij+1 are odd.
If j−1≥1 and j+2≤n−1, set
[TABLE]
If j−1<1, set
[TABLE]
If j+2>n−1, set
[TABLE]
Case ii:
ij is odd and ij+1 is even.
Case ii.1:
j+2≤n−1 and ij+2 is odd;
j−1≥1 and ij−1 is even.
Suppose βj=aijaj1aij+1 and βj+1=aij+1aj2aj3aij+2 for some even j2 and odd j1 and j3,
where aj1=aij and aj3=aij+2.
Set
[TABLE]
Case ii.2:
j+2≤n−1 and ij+2 is odd; either
j−1≥1 and ij−1 is odd, or j−1<1.
If j−1≥1 and ij−1 is odd, define αj as in Case ii.1
(note that ψ(xj)=ε in this case).
If j−1<1, define αj as in Case ii.1 but with ψ(x1)
prepended to αj.
Case ii.3:
j−1≥1 and ij−1 is even; either
j+2≤n−1 and ij+2 is even, or j+2>n−1.
Suppose βj=aijaj1aij+1
for some odd j1 such that aj1=aij.
If j+2≤n−1 and ij+2 is even, set αj=aj1aij+1ψ(xj)aijaj1. If j+2>n−1, set αj=aj1aij+1ψ(xn)ψ(xj)ψ(xn)aijaj1ψ(xn).
Case ii.4:
j−1≥1 and ij−1 is odd, or j−1<1;
j+2≤n−1 and ij+2 is even, or j+2>n−1.
Suppose βj=aijaj′aij+1, where j′ is
odd and aj′=aij.
If j−1≥1, ij−1 is odd, j+2≤n−1
and ij+2 is even, set αj=aj′aij+1aijaj′.
If j−1<1, set αj=ψ(x1)aj′aij+1ψ(x1)aijaj′.
If j+2>n−1, set αj=aj′aij+1ψ(xn)aijaj′ψ(xn).
Case iii:
ij is even and ij+1 is odd.
Case iii.1:
βj=aijaj1aj2aij+1
for some even j1 and odd j2 such that aj2=aij+1.
Set
[TABLE]
Case iii.2:
βj=aijaj2aij+1 for some
odd j2 such that aj2=aij+1 (note that if j−1≥1,
then ij−1 is even and so ψ(xj)=ε).
Define αj as in Case iii.1, but with all occurrences of aj1 deleted.
Set w3\mathchar58=α1α2…αn−2.
By construction, w1∈L(π) and w2∈L(π).
Furthermore, induction on j=1,…,n−2 shows that
the longest prefix of x1ai1x2ai2x2…ain−1xn
matching α1…αj is x1ai1x2…aijxj+1.
Hence w3∈/L(π). The lemma will follow from the next two claims.
Claim E.1
Suppose h,g\mathchar58(X∪Σ)∗↦Σ∗ are constant-preserving morphisms witnessing
w1∈L(τ) and w2∈L(τ) respectively, and
suppose π(ε)=ai1ai2…ain−1⊑τ(ε).
Let ⟨p1,p2,…,pn−1⟩ be a sequence of positions of τ
such that τ[pj]=aij for all j∈{1,…,n−1}.
For each j∈{1,…,n−1}, let qj be the position of w1 occupied by
the specific occurrence of aij indicated with braces in Equation (5).
[TABLE]
Similarly, let Rj be the sequence of positions of w2 indicated with braces in
Equation (6).
[TABLE]
Let Ih,τconst (resp. Ig,τconst) be the mapping of sequences
of positions of constants in τ to sequences of positions of w1
(resp. w2) induced by h (resp. g).
Then for all j∈{1,…,n−1}, Ih,τconst(⟨pj⟩)=⟨qj⟩ and Ig,τconst(⟨pj⟩)
is a subsequence of Rj.
In particular, if ai1ai2…ain−1⊑τ(ε), then
L(τ)=L(π).
Claim E.2
Let η be any regular pattern such that {w1,w2}⊂L(η) and
ai1ai2…ain−1⊑η(ε).
Then w3∈L(η).
Proof of Claim E.1. Let P1,P2,…,Pn−1 denote
the sequences of positions of w1 indicated by braces in Equation (7).
[TABLE]
It suffices to show that whenever j∈{1,…,n−1}, Ih,τconst(⟨pj⟩)
is a subsequence of Pj; the claim that Ih,τconst(⟨pj⟩)=⟨qj⟩
will then follow from the fact that φ(xj)∈/N(xj,π) for all j∈{1,…,n−1}.
So assume, by way of contradiction, that there were a least ℓ∈{1,…,n−1} such that
Ih,τconst(⟨pℓ⟩) is not a subsequence
of Pℓ. First, suppose that Ih,τconst(⟨pℓ⟩)
were a subsequence of some Pℓ′ with ℓ′<ℓ. Then,
since φ(xℓ)∈/{aiℓ,aiℓ−1} and
φ(xℓ−1)=aiℓ−1, Ih,τconst(⟨pℓ−1⟩)
is not a subsequence of Pℓ−1. Iterating the preceding argument
then gives that for all j≤ℓ, Ih,τconst(⟨pj⟩)
is not a subsequence of Pj, a contradiction. A similar argument holds if Ih,τconst(⟨pℓ⟩) were
a subsequence of some Pℓ′′ with ℓ′′>ℓ.
The proof that Ig,τconst(⟨pj⟩) is a subsequence of Rj is similar (making crucial use of the definition of ψ).
This establishes the first part of the claim.
Now we establish the second part of the claim. Note that from the first
part of the claim, if ij is odd, then Ig,τconst(⟨pj⟩)
cannot be a subsequence of the sequence of positions of w2
corresponding to ψ(xj) (resp. ψ(xj+1)).
If ij is even, then Ig,τconst(⟨pj⟩) cannot
be a subsequence of the sequence of positions of w2 corresponding
to ψ(xj). Furthermore, suppose Ig,τconst(⟨pj⟩)
were a subsequence of the sequence of positions of w2 corresponding
to ψ(xj+1); then if j+1≤n−1, ij+1 must be odd and
therefore Ig,τconst(⟨pj+1⟩) equals ⟨q′⟩,
where q′ is the position of w2 occupied by aij+1 in
Rj+1.
From the fact that {w1,w2}⊂L(τ), we know that
τ must start as well as end with variables.
For any α∈(X∪Σ)∗, let o(α) denote
the number of substrings of α of the shape b1xb2,
where x∈X∪{ε}, b1,b2∈Σ and
b1,b2 have opposite parities. Note that o(w2)=o(π).
Since Ih,τconst(⟨pj⟩)=⟨qj⟩
whenever j∈{1,…,n−1}, it follows that if
ai1…ain−1⊏τ(ε), then there is some
position p′ of τ such that for some j∈{1,…,n−2},
pj<p′<pj+1 and τ[p′]=φ(xij+1)∈Σ.
By the definition of φ, if φ(xij+1)=aj′, then
j′ has parity opposite to that of ij as well as ij+1. Thus
o(τ)>o(π). But w2∈L(τ) implies o(τ)≤o(π),
and therefore τ(ε)=ai1…ain−1.
The fact that w1∈L(τ) (resp. w2∈L(τ)) implies
that a variable occurs in τ between every pair aij,aij+1 such
that ij and ij+1 have equal (resp. opposite) parities.
Thus L(τ)=L(π). (Claim E.1)
Proof of Claim E.2.
Our strategy to show w3∈L(η) is as follows.
First, fix some constant-preserving morphism g\mathchar58(X∪Σ)∗↦Σ∗
such that g(η)=w2. Then g induces a mapping
Ig,η of closed intervals of {1,…,∣η∣} to
closed intervals of {1,…,w2} such that for all
[p1,p2]⊆{1,…,∣η∣}, g(η[p1]…η[p2])=w2[Ig,η([p1,p2])]. One may take the “inverse”
Ig,η of Ig,η, where, for all [q1,q2]⊆{1,…,∣w2∣},
Ig,η([q1,q2])=[s1,s2] for some s1,s2∈{1,…,∣η∣}
such that [q1,q2] is a subinterval of Ig,η([s1,s2])
and for all proper subintervals R of [s1,s2], [q1,q2] is not a subinterval of
Ig,η(R).
Let r1,…,rn−1 be the positions of ai1,…,ain−1
respectively in w2 marked with braces in Equation (8).
[TABLE]
By our assumption on η, there is a least ℓ∈{1,…,n−1}
such that η[Ig,η([rℓ,rℓ])] is a variable and
if there is a least r′>rℓ such that η[Ig,η([r′,r′])] is a constant,
then η[Ig,η([r′,r′])]=aiℓ.
As argued at the beginning of the construction of w3,
η starts and ends with variables, and every maximal constant
block A of η has length at most 2; furthermore, if the
length of A is exactly 2, then A=aj1aj2
for some j1,j2∈{1,…,k} such that j1 and j2 have
opposite parities.
We define a set C consisting of all possible intervals of positions of w2 of length at most 2
such that for every maximal constant block of η, say η[J] for some
closed interval J⊆{1,…,∣η∣}, there is an I∈C
for which Ig,η(J)⊆I.
First, suppose iℓ is even and the first letter of ψ(xℓ+1)
equals aiℓ. Then C consists of all intervals of
positions of w2 of the form
i
[q,q+1], where q<rℓ−1 and w2[q]w2[q+1]=aj1aj2
for some j1,j2∈{1,…,k} with opposite parities, or
2. ii
[q,q+1], where q>rℓ+1 and w2[q]w2[q+1]=aj1aj2 for some j1,j2∈{1,…,k}
with opposite parities, or
3. iii
[q,q], where q<rℓ−1 and if q≥2, then w2[q−1]w2[q]w2[q+1]=baj3aj4 for some b∈Σ
and j3,j4∈{1,…,k} such that j3 and j4 have equal parities,
and b=aj5 for some j5∈{1,…,k} such that
j5 and j3 have equal parities; if q<2, then the same holds with
w2[q−1] and b replaced with ε, or
4. iv
[q,q], where q=rℓ−1 and if q≥2, then w2[q−1]w2[q]=baj6
for some j6∈{1,…,k} and b∈Σ such that
if b=aj7 for some j7∈{1,…,k},
then j7 and j6 have equal parities; if q<2, then the same holds with
w2[q−1] and b replaced with ε, or
5. v
[q,q] for some q>rℓ+2 such that if q+1≤∣w2∣, then w2[q−1]w2[q]w2[q+1]=aj8aj9b for some j8,j9∈{1,…,k}
with equal parities and b∈Σ such that if b=aj10 for some
j10∈{1,…,k}, then j10 and j9 have equal parities;
if q+1>∣w2∣, then the same holds with w2[q+1] and b replaced with ε,
or
6. vi
[q,q], where q=rℓ+2 and if q+1≤∣w2, then w2[q]w2[q+1]=aj11b
for some j11∈{1,…,k} and b∈Σ
such that if b=aj12 for some j12∈{1,…,k}, then
j12 and j11 have equal parities; if q+1>∣w2∣, then
the same holds with w2[q+1] and b replaced with ε.
Second, suppose either iℓ is odd or the first letter of
ψ(xℓ+1) is not equal to aiℓ.
Then we define C exactly as above but with three differences:
first, q>ℓ+1 is replaced with q>ℓ in (ii); second, q>ℓ+2 is
replaced with q>ℓ+1 in (v); third, q=ℓ+2 is replaced
with q=ℓ+1 in (vi).
We next define a one-one mapping F from C to the set of all intervals
of positions of w3 satisfying the following conditions for all [q,q],[q,q+1]∈C:
•
F([q,q+1])=[q′,q′+1] for some q′∈{1,…,∣w3∣−1} with
w2[q]w2[q+1]=w3[q′]w3[q′+1].
•
F([q,q])=[q′,q′] for some q′∈{1,…,∣w3∣} with
w2[q]=w3[q′].
•
Suppose q1 and q2 are the left endpoints of I1 and
I2 respectively, where I1,I2∈C, I1=I2 and q1<q2 (note that
no two distinct members of C intersect). Let q1′ and q2′
be the left endpoints of F(I1) and F(I2) respectively.
Then q1′<q2′ and F([I1])∩F([I2])=∅.
Note that the existence of an F satisfying the above three conditions implies
that for any sequence ⟨I1,I2,…,Im⟩ of intervals of positions
of w2 such that every Ii corresponds to a maximal constant block
of η and for all i,j∈{1,…,m} with i<j,
Ii∩Ij=∅, and the left endpoint of Ii is smaller than that
of Ij, there is a corresponding sequence ⟨I1′,I2′,…,Im′⟩
of intervals of positions of w3 such that for all i,j∈{1,…,m}
with i<j, w2(Ii)=w3(Ii′), Ii′∩Ij′=∅,
and the left endpoint of Ii′ is smaller than that of Ij′.
Thus, since η starts as well as ends with variables, the existence of
such an F will suffice to show that w3∈L(η). We consider
a case distinction based on the earlier definition of C. Let Q1,…,Qn−2 be the closed intervals of positions of w3
corresponding to the occurrences of α1,…,αn−2 respectively
as shown in Equation (9).
[TABLE]
Consider any I∈C.
Case 1:
I=[rj,rj+1] for some j<ℓ, where, if w2[rj,rj+1]=aj′aj′′
for some j′,j′′∈{1,…,k}, then j′ and j′′ have opposite parities.
Note that if ij were odd, then by Cases i and iii in the construction of w2,
w2[rj+1]=aj′ would imply that j′ is odd, which is impossible
by Conditions i and ii in the definition of C. Hence ij is even.
Furthermore, an inspection of Cases i and ii in the construction
of w2 shows that rj+1=rj+1, and therefore rj+1 is the position of
the first letter of ψ(xj+1) in w2; moreover, ij+1 is odd.
Suppose βj=aijaj1aij+1
for some odd j1 such that aj1=aij+1 (the positions
of β1,…,βn−2 are illustrated in Equation (4)).
From Case iii.2 in the construction of w3, one sees that
αj=γaijaj1 for some γ∈Σ∗;
fix γ.
Set F(I)=[∑1≤l<j∣αl∣+∣γ∣+1,∑1≤l<j∣αl∣+∣γ∣+2].
Case 2:
I=[rj−1,rj] for some j<ℓ.
First, suppose j−1≥1. Then an argument similar to that in Case 1.1 shows that
ij must be even and ij−1 must be odd.
From Cases i.1.1 and iii in the construction of w3,
one sees that αj=γ1w2[rj−1]aijγ2
for some γ1,γ2∈Σ∗; fix such γ1 and γ2.
Set F(I)=[∑1≤l<j∣αl∣+∣γ1∣+1,∑1≤l<j∣αl∣+∣γ1∣+2].
Second, suppose j−1<1. From Cases i.1.3, i.1.4, i.2, ii.4 and iii
in the construction of w3, we deduce that
there are γ1,γ2∈Σ∗ such that
α1=γ1ψ(x1)ai1γ2; fix such γ1 and γ2.
Set F(I)=[∣γ1∣+1,∣γ1∣+2].
Case 3:
I=[rj+1,rj+2] for some j<ℓ
such that ψ(xj+1)=w2[rj+1]w2[rj+2].
Based on the case distinction in the construction of w2,
one sees that ij must be even and if j+2≤n−1,
then aij+1 must be odd.
From Case iii.1, we deduce that αj=γψ(xj+1)
for some γ∈Σ∗. Set
F(I)=[∑1≤l<j∣αl∣+∣γ∣+1,∑1≤l<j∣αl∣+∣γ∣+2].
Case 4:
I=[rj−1,rj] for some j>ℓ.
Arguing as in the earlier cases, ij must be even
and ij−1 must be odd. By examining Case ii
in the construction of w3, one sees that
αj−1=w2[rj−1]aijγ for some
γ∈Σ∗. Set F(I)=[∑1≤l<j−1∣αl∣+1,∑1≤l<j−1∣αl∣+2].
Case 5:
I=[rj,rj+1] for some j>ℓ.
First, suppose j+1≤n−1. Arguing as before, ij and ij−1 must be even while ij+1 must be odd.
It follows from Case i.1 in the construction of w3
that αj−1=γ1aijw2[rj+1]γ2 for some
γ1,γ2∈Σ∗; fix such γ1 and γ2.
Set F(I)=[∑1≤l<j−1∣αl∣+∣γ1∣+1,∑1≤l<j−1∣αl∣+∣γ1∣+2].
Second, suppose j+1>n−1, i.e. j=n−1. It follows from Cases
i.1.2 and i.1.4 in the construction of w3 that for some γ1,γ2∈Σ∗,
αn−2=γ1ain−1ψ(xn)γ2; fix
such γ1 and γ2. Set F(I)=[∑1≤l<j−1∣αl∣+∣γ1∣+1,∑1≤l<j−1∣αl∣+∣γ1∣+2].
Case 6:
I=[rj+1,rj+2] for some j>ℓ such
that ψ(xj+1)=w2[rj+1]w2[rj+2].
Based on the case distinction in the construction of w2,
we deduce that ij−1 and ij+1 are odd while
ij is even. It follows from Case ii in the construction of
w3 that αj−1=γ1aijγ2w2[rj+1]w2[rj+2]γ3 for some γ1,γ2,γ3∈Σ∗;
fix such γ1,γ2 and γ3. Set
F(I)=[∑1≤l<j−1∣αl∣+∣γ1∣+∣γ2∣+2,∑1≤l<j−1∣αl∣+∣γ1∣+∣γ2∣+3].
Case 7:
I=[rj,rj] for some j<ℓ.
Case 7.1:
ij is even.
First, suppose j−1≥1.
Then both ij−1 and ij+1 must be even.
From Case i.1 in the construction of w3, we deduce
that there exist γ1,γ2∈Σ∗ such that
αj=γ1aijγ2 and ∣γ2∣≤1;
fix such γ1 and γ2.
Set F(I)=[∑1≤l<j∣αl∣+∣γ1∣+1,∑1≤l<j∣αl∣+∣γ1∣+1].
Second, suppose j−1<1. It follows from Cases i.1
and iii in the construction of w3 that there exist
γ1,γ2∈Σ∗ such that α1=γ1ai1γ2
and ∣γ2∣≤2; fix such γ1 and γ2.
Set F(I)=[∣γ1∣+1,∣γ1∣+1].
Case 7.2:
ij is odd.
It follows from Cases i.2 and ii in the construction of w3 that
there exist γ1,γ2∈Σ∗ such that
αj=γ1aijγ2 and ∣γ2∣≤1;
fix such γ1 and γ2. Set F(I)=[∑1≤l<j∣αl∣+∣γ1∣+1,∑1≤l<j∣αl∣+∣γ1∣+1].
Case 8:
I=[rj,rj] for some j>ℓ.
From the case distinction in the construction of w3, we deduce
that there exist γ1,γ2∈Σ∗ with ∣γ1∣≤1
such that αj−1=γ1aijγ2; fix such γ1
and γ2. Set F(I)=[∑1≤l<j−1∣αl∣+∣γ1∣+1,∑1≤l<j−1∣αl∣+∣γ1∣+1].
Case 9:
I=[1,1].
Observe from the construction
of w3 that α1 starts with ψ(x1). Set F(I)=[1,1].
Case 10:
I=[∣w2∣,∣w2∣].
Observe from the construction
of w3 that αn−2 ends with ψ(xn).
Set F(I)=[∣w3∣,∣w3∣].
This completes the definition of F. By Claim E.2, since {w1,w2}⊂L(τ) and
w3∈/L(τ), one has that ai1ai2…ain−1⊑τ(ε).
Thus by Claim E.1, L(τ)=L(π).
Therefore T={(w1,+),(w2,+),(w3,−)} is indeed a teaching set
for π w.r.t. RΠz.
We give an example to illustrate the construction of the teaching set
in the proof of Lemma 6.
Example F.1
Suppose Σ={0,1,2}. Following the notation of Lemma 6,
set a1=0,a2=1 and a3=2. Let π=x10x21x32x41x51x6.
According to the construction in the proof of Lemma 6,
π has the teaching set {(w1,+),(w2,+),(w3,−)} w.r.t. RΠ3, where
w1,w2 and w3 are defined as follows (φ,ψ and αi are defined
as in the proof of Lemma 6):
Proof. Note that for any 0∈Σ, x1 has the teaching set {(ε,+),(0,+)}.
Now suppose π contains at least one constant symbol.
Assertion (i).
If Σ={0}, then there is some m≥1 such that π is equivalent
to the pattern 0mx1, and so π may be taught with the examples (0m,+),(0m+1,+) and (0m−1,−).
If ∣Σ∣=∞, then one can choose distinct constants a1,a2,…,an∈Σ∖{c1,…,cn−1}.
Any pattern τ consistent with the examples (π(ε),+) and
(π[x1→a1,x2→a2,…,xn→an],+) must be simple block-regular
and satisfy τ(ε)⊑π(ε). By Lemma 2,
the example (π(ε),−) will ensure, in addition,
that τ(ε)⊏π(ε).
Finally, note that any simple block-regular pattern not equivalent to x1
must be taught using at least 3 examples (for a similar proof,
see [7, Theorem 12.1].
Assertion (ii).
First, suppose Σ={a1,a2,…,aℓ} for some ℓ≥3.
We show that any teaching set for π w.r.t. Πℓ must contain at least
⌊ℓn⌋ positive examples.
Assume that some teaching set T for π w.r.t. Πℓ contains k positive examples (w1,+),…,(wk,+) for some k≥1. For each i∈{1,…,k}, fix a substitution hi\mathchar58X↦Σ∗ such
that hi(π)=wi. Let {zji\mathchar58i,j∈N} be a subset of X such that
zji=zj′i′ whenever (i,j)=(i′,j′).
For each i∈{1,…,k}, let gi\mathchar58Σ∗↦X∗
be a morphism such that gi(aj)=zji for all j∈{1,…,ℓ}.
Let π′ be the pattern derived from π by replacing each
x∈\mboxVar(π) with the string g1(h1(x))g2(h2(x))…gk(hk(x));
π′ can be written in the form A1c1A2…cn−1An,
where A1,A2,…,An∈{zji\mathchar581≤i≤k∧1≤j≤ℓ}∗.
By construction, wi∈L(π′) for all i∈{1,…,k}.
In particular, note that if πi′ is the restriction of π′ to
{z1i,…,zli}∪Σ, then wi∈L(πi′).
Furthermore, since π′ is similar to π,
one has L(π′)⊆L(π)
and so π′ is consistent with T. As T is a teaching set for
π w.r.t. Πℓ, L(π′)=L(π) and therefore every Ai
contains at least one free variable. Hence
[TABLE]
On the other hand, since \left|\{x\mathrel{\mathop{\mathchar 58\relax}}\mbox{xisafreevariableof\pi^{\prime}}\}\right|\subseteq\{z^{i}_{j}\mathrel{\mathop{\mathchar 58\relax}}1\leq i\leq k\wedge 1\leq j\leq\ell\},
we have
[TABLE]
It now follows from Equations (10) and (11)
that n≤ℓk, and therefore k≥⌊ℓn⌋,
as required.
The proof for binary alphabets is similar. Suppose Σ={0,1}.
Define an operation O on any τ∈RΠ2 as follows:
pick the first occurrence of a substring of τ of the shape xδx′δx′′,
where x∈X and δ∈Σ and delete x′. If no such
substring occurs in τ, set O(τ)=τ.
Then for all τ∈RΠ2, one has O(τ)=τ′
for some τ′∈RΠ2 with L(τ′)=L(τ) [29, Lemma 2].
We iteratively apply O to π until no new regular pattern is
produced; that is to say, we find the least k such that Ok+1(π)=Ok(π). Setting τ′=Ok(π), notice that for
all η∈Π2 with η similar to τ′ and L(η)=L(τ′),
every maximal variable block of η must contain a free variable.
To see this, let η=A1c1…cn−1An and
τ′=x1c1…cn−1xn (after normalisation of τ′),
where A1,…,An∈X∗ and c1,…,cn−1∈{0,1,01,10}.
Choose some δ1∈Σ that differs from the first symbol of
c1, and set w1=τ′[x1→δ1]. Since L(η)=L(τ′),
we have w1∈τ′ and therefore A1 must contain a free variable.
A similar argument shows that An contains a free variable.
Now consider any i∈{2,…,n−1}. If \mboxN(xi,τ′)={δ}
for some δ∈Σ, then setting wi=τ′[xi→δ]
gives wi∈L(τ′)=L(η) and so Ai must contain a free
variable. If \mboxN(xi,τ′)={0,1}, then at least one of ci−1 and ci,
say ci−1, equals δδ for some δ∈Σ.
Pick δ′∈Σ that differs from the first symbol of ci
(if ci=δδ instead, let δ′∈Σ be a
letter that differs from the last symbol of ci−1).
Setting wi=τ′[xi→δ′] then gives wi∈L(η), and
so Ai contains a free variable.
The proof for the case ∣Σ∣≥3 may now be applied to τ′.
Note that ∣\mboxVar(τ′)∣≥⌊32n⌋,
and so the earlier proof gives that every teaching set for τ′ w.r.t. Π2 must contain at least ⌊3n⌋
positive examples.
We exhibit a family of simple block-regular patterns for which the
lower bound given in Theorem 7(ii)
is tight (up to numerical constant factors).
Suppose z=∣Σ∣≥2 and 0,1∈Σ.
For all n∈N, let πn be the simple block-regular
pattern x10x20…0xn+1; in particular, πn(ε)=0n.
We construct a teaching set T for πn w.r.t. Πz
as follows. Let τ denote any pattern that is consistent with T.
First, put (0n,+) into T; this example ensures that τ(ε)⊑0n. Next, for each k∈{0,…,n−1}, put (0k,−)
into T. The examples put into T so far ensure that τ(ε)=0n.
Now for all i∈{1,…,n+1}, put (π[xi→1,xj→ε,j∈{1,…,n+1}∖{i}],+) into T. The last set of examples
will ensure that every maximal variable block of τ contains at
least one free variable. Thus L(τ)=L(π), and this proves
that πn has a teaching set w.r.t. Πz of size O(n).
Proof. Given that L(τ)⊆L(π) and τ(ε)=π(ε),
both τ and π contain at least one variable, and so there is
some S⊆\mboxVar(τ) of minimum possible size such that
L\left(\tau{\big{|}}_{\Sigma\cup S}\right)\not\subseteq L(\pi). Fix such an S.
By the choice of S, one has L\left(\tau{\big{|}}_{\Sigma\cup(S\setminus\{y^{\prime}\})}\right)\subseteq L(\pi) for all y′∈S. Fix any y∈S, and set S′\mathchar58=S∖{y}.
Without loss of generality, assume S′={x1,…,xℓ} (S′ may
also be empty). As noted earlier, L\left(\tau{\big{|}}_{\Sigma\cup S^{\prime}}\right)\subseteq L(\pi).
Now suppose, by way of contradiction, that ∣S∣>1+(∣π(ε)∣+m+4)⋅∣\mboxVar(π)∣.
Let φ\mathchar58X↦Σ∗ be the substitution defined by φ(xi)=012i⋅∣τ∣0 for all i∈{1,…,ℓ} and φ(z)=ε
for all z∈X∖{x1,…,xℓ}. Set w\mathchar58=φ(τ).
We first establish the following claim.
Claim I.1
For all i∈{1,…,ℓ}, w contains exactly m occurrences of
φ(xi)=012i⋅∣τ∣0. Furthermore, all m occurrences
of φ(xi) are disjoint.
Proof of Claim I.1.
Fix any i∈{1,…,ℓ}. Since \tau\in\mbox{QR\Pi}^{z}_{\infty,m}, there are at least
m occurrences of φ(xi) in w. We show that there
cannot be any occurrence of φ(xi) that overlaps with (i) a
constant part of τ, or (ii) an occurrence of φ(xj) for some j∈{1,…,ℓ} such that φ(xj) and φ(xi) occupy different
intervals of positions of w.
Assume otherwise. Consider any j∈{1,…,ℓ}. Since
φ(xj) starts and ends with [math], the occurrences of φ(xj)
and φ(xi) coincide or φ(xj) overlaps with φ(xi)
only at the first or last position of φ(xi).
First, suppose an occurrence of φ(xi) overlaps with a constant part
of τ. Since ∣012i⋅∣τ∣0∣>∣τ∣, this occurrence of φ(xi)
must overlap with an occurrence of φ(xj) that is generated by a variable of
τ for some j∈{1,…,ℓ}.
By the observation in the preceding paragraph, since the occurrences of φ(xj)
and φ(xi) must be different, φ(xj) can overlap with φ(xi)
only at the first or last position of φ(xi). It follows that
each of the 2i⋅∣τ∣ occurrences of 1 in φ(xi) must overlap
with a constant part of τ, which is impossible as 2i⋅∣τ∣>∣τ∣.
Second, suppose an occurrence of φ(xi) overlaps with an occurrence of
φ(xj) for some j∈{1,…,ℓ} such that φ(xj) and
φ(xi) occupy different intervals of positions of w.
An argument similar to that in the preceding paragraph shows that
each of the 2i⋅∣τ∣ occurrences of 1 in φ(xi) must
overlap with a constant part of τ, which is impossible. (Claim I.1)
Let the variable part of \tau{\big{|}}_{\Sigma\cup S^{\prime}} (i.e. \tau{\big{|}}_{S^{\prime}}) be xi1…ximℓ (since
\tau{\big{|}}_{\Sigma\cup S^{\prime}} has ℓ distinct variables, it has mℓ variable
occurrences).
Set c=∣π(ε)∣, and write w as
[TABLE]
where γ1,…,γmℓ+1∈Σ∗, τ(ε)=γ1γ2…γmℓ+1
and J1,H1,…,Jmℓ,Hmℓ,Jmℓ+1 are the intervals of positions of w
corresponding to the subwords marked in Equation (12).
Since L\left(\tau{\big{|}}_{\Sigma\cup S^{\prime}}\right)\subseteq L(\pi), there is a morphism
θ\mathchar58X∗↦X∗ such that θ(π)=w.
We claim that for all j∈{1,…,mℓ−m−c−3},
Iθ,π maps the positions of at least two variable occurrences
of π to intervals of positions of w that overlap with the interval
corresponding to
[TABLE]
Formally, this means there are at least two positions of π occupied by variables,
say p1 and p2, such that
[TABLE]
for k∈{1,2}.
Suppose the latter statement does not hold. For all i∈{1,…,ℓ}, since ∣φ(xi)∣>∣τ∣≥∣S∣+c>m⋅∣\mboxVar(π)∣+c=∣π∣, no constant part of π
can cover φ(xi), and so there must be some
q∈{1,…,∣π∣} such that π[q] is a variable and Iθ,π(q)
covers Jj+1∪Hj+1∪…∪Hj+c+m+2∪Jj+c+m+3, i.e.
[TABLE]
Since every variable of π occurs exactly
m times, there must be at least m occurrences of
[TABLE]
in w. According to Claim I.1, φ(xi) occurs
exactly m times in w for all i∈{1,…,ℓ}, and all its m occurrences
are disjoint. Thus for all distinct j1,j2∈{j+1,…,j+c+m+2}, ij1=ij2. Furthermore, since there are at most c indices i with γi=ε,
w′ contains at least c+m+1−c=m+1 subwords of the shape φ(xij1)φ(xij2),
where j1=j2 and j1,j2∈{1,…,ℓ}. This means there are at least m+1 pairs
(j1,j2) with j1=j2 and j1,j2∈{1,…,ℓ} such that
\tau{\big{|}}_{\Sigma\cup S^{\prime}} contains exactly m occurrences of the substring xj1xj2.
Since y occurs exactly m times in \tau{\big{|}}_{\Sigma\cup S} (we recall that S=S′∪{y}),
there is at least one pair (k1,k2) with k1=k2 and k1,k2∈{1,…,ℓ}
such that xk1xk2 occurs exactly m times in \tau{\big{|}}_{\Sigma\cup S}.
But by Theorem 17, \tau{\big{|}}_{\Sigma\cup S} would then be equivalent to
\tau{\big{|}}_{\Sigma\cup(S\setminus\{x_{k_{2}}\})},
contradicting the minimality of ∣S∣. Thus there are indeed at least 2 positions
of variables in π, say p1 and p2, such that (13) holds.
Arguing inductively, it follows that the number of variable occurrences of π (including
variable repetitions) is at least c+m+4mℓ. Consequently,
Proof. We first consider the case z=1. Suppose Σ={0}. Every language generated by a pattern in
\mbox{QR\Pi}^{1}_{\infty,m} is equivalent to a pattern of the shape 0kxm or 0k′, where k∈N0
and k′∈N.
Let π\mathchar58=0kxm. If k≥m, then π can be taught using the sample {(0k,+),(0k+m,+)(0k−m,−)}:
the two examples (0k,+) and (0k−m,−) uniquely identify the constant part of π, while
(0k+m,+) distinguishes π from the constant pattern 0k.
If k<m, then {(0k,+),(0k+m,+)} is a teaching set for π: since k<m, (0k,+) already
uniquely identifies the constant part of π, while as before (0k+m,+) ensures that π is not a constant
pattern. Let π′\mathchar58=0k′. Then {(0k′,+),(0k′+m,−)} is a teaching set for π′: the constant part
of any pattern τ consistent with (0k′,+) is equal to 0k′′ for some k′′≤k′; if L(τ)=L(π),
then τ contains a variable x such that for some i≥1 with k′′+mi=k′, 0k′ is obtained from τ by
substituting 0i for x. Replacing x with 0i+1 yields 0k′+m∈L(τ),
and so τ is inconsistent with (0k′+m,−).
In any one of the above cases, one has \mbox{TD}(\pi,\mbox{QR\Pi}^{1}_{\infty,m})\leq 3.
Furthermore, suppose η\mathchar58=0mxm. Any teaching set for η must contain
at least one positive and one negative example since L(0m)⊂L(η) and
L(η)⊂L(xm); an additional positive example is needed to distinguish η
from all constant patterns. Hence \mbox{TD}(\eta,\mbox{QR\Pi}^{1}_{\infty,m})\geq 3.
Now suppose z≥2. Fix any \pi\in\mbox{QR\Pi}^{z}_{k,m}. We build a teaching set T for π w.r.t. \mbox{QR\Pi}^{z}_{\infty,m}.
Let η denote any pattern in \mbox{QR\Pi}^{z}_{\infty,m} that is consistent with T.
First, put (π(ε),+) into T. Next, for every w⊏π(ε),
put (w,−) into T. The O(2∣π(ε)∣) examples added to T up to the present stage ensure that
η(ε)=π(ε). By [28], there is a finite tell-tale set for π w.r.t. \mbox{QR\Pi}^{z}_{\infty,m},
that is, a finite set S⊆L(π) such that for all \tau\in\mbox{QR\Pi}^{z}_{\infty,m},
one has S⊆L(τ)⊆L(π)⇒L(τ)=L(π); furthermore,
[28, Lemma 9] implies that this set S has size O(⌈D1⋅(∣π(ε)∣+D1⋅m)D1⋅m⌉),
where D1\mathchar58=(1/m)⋅(2∣π∣−∣π(ε)∣).
Put {(w′,+)\mathchar58w′∈S} into T. The examples in T now ensure that
η(ε)=π(ε) and L(η)⊂L(π). Thus if L(η)=L(π), then L(η)⊆L(π).
Next, for each \tau\in\mbox{QR\Pi}^{z}_{1+\left(|\pi(\varepsilon)|+m+4\right)\cdot\left|\mbox{Var}(\pi)\right|,m} such that
L(τ)⊆L(π) and τ(ε)=π(ε), pick some vτ∈L(τ)∖L(π) and put (vτ,−) into
T; note that there are O(D2⋅(∣π(ε)∣+D2⋅m)D2⋅m) many such τ (up to equivalence), where
D2\mathchar58=1+(∣π(ε)∣+m+4)⋅∣\mboxVar(π)∣. As was observed earlier,
if L(η)=L(π), then L(η)⊆L(π), and so by Lemma 9,
η(ε)=π(ε) implies there is some \tau^{\prime}\in\mbox{QR\Pi}^{z}_{1+\left(|\pi(\varepsilon)|+m+4\right)\cdot\left|\mbox{Var}(\pi)\right|,m} with L(τ′)⊆L(η) and L(τ′)⊆L(π); the negative example (vτ′,−) would therefore
ensure that η is inconsistent with T. At this stage, T has altogether
O(2∣π(ε)∣+D⋅(∣π(ε)∣+D⋅m)D⋅m) examples,
where D\mathchar58=max({(1/m)⋅(2⋅∣π∣−∣π(ε)∣),1+(∣π(ε)∣+m+4)⋅∣\mboxVar(π)∣}).
We first observe a basic fact about graph colourings.
We recall that for any finite, simple graph G=(V,E), the distance between any two vertices u and v,
denoted dG(u,v), is the length of a shortest path in G from u to v (or vice-versa;
if no such path exists, then dG(u,v)=∞),
and for any ℓ≥1, the ℓ-distance chromatic number of G, denoted χℓ(G),
is the smallest k for which there exists a k-colouring of G such that for any pair of vertices s,t of
G with dG(s,t)≤ℓ, s and t receive distinct colours; such a colouring is called
an ℓ-distance colouring of G [21, 24].
Lemma K.1
Let G=(V,E) be any finite, simple graph with vertex set V, edge set E
and maximum degree Δ(G). Then χ2(G)≤Δ(G)2+1; equality
occurs if G is the 5-cycle.
Proof. We note that χ2(G) is equal to χ1(G2), the (ordinary) chromatic number of
the square of G; G2 is the graph whose vertex set is equal to that
of G and for all distinct vertices v1,v2 of G, (v1,v2) is an edge of G2 iff
dG(v1,v2)≤2. The maximum degree of any vertex of G2 is at most
Δ(G)+Δ(G)⋅(Δ(G)−1)=Δ(G)2, and so by Brook’s theorem [21, Theorem 11],
χ1(G2)≤Δ(G)2+1; equality occurs if G is the 5-cycle. (Lemma
K.1)
Proof of Theorem 11.
If m=1, then \mbox{QR\Pi}^{z}_{\infty,m,cf} contains only the pattern x and so
\mbox{TD}(\mbox{QR\Pi}^{z}_{\infty,1,cf})=\mbox{PBTD}(\mbox{QR\Pi}^{z}_{\infty,1,cf})=0. Suppose m≥2.
Given \pi,\tau\in\mbox{QR\Pi}^{z}_{\infty,m,cf} that are succinct, define π≺τ
iff ∣τ∣<∣π∣. For any succinct pattern \pi\in\mbox{QR\Pi}^{z}_{\infty,m,cf} with
\mboxVar(π)={x1,…,xn}, define the adjacency graph of π, denoted
\mboxAG(π), to be the bipartite graph whose vertex set comprises two copies of \mboxVar(π),
one denoted \mboxVar(π)L\mathchar58={x1L,…,xnL} and the other denoted
\mboxVar(π)R\mathchar58={x1R,…,xnR}, such that an edge connects
xiL and xjR iff xixj is a substring of π [26, Chapter 3].
We find the least k such that some k-colouring c\mathchar58\mboxVar(π)L∪\mboxVar(π)R↦{1,…,k} of \mboxAG(π) satisfies the following conditions.
For all i∈{1,…,n}, c(xiL)=c(xiR).
2. 2.
For any distinct j1,j2∈{1,…,n}, if (xiL,xj1R)∈E(\mboxAG(π))
and (xiL,xj2R)∈E(\mboxAG(π))
(resp. (xj1L,xiR)∈E(\mboxAG(π)) and (xj2L,xiR)∈E(\mboxAG(π))),
then c(xj1R)=c(xj2R)
(resp. c(xj1L)=c(xj2L)).
We show that k≤4m2+1. Let G be the graph obtained from \mboxAG(π)
by contracting the pair (xiL,xiR) of vertices for all i∈{1,…,n}
(i.e. the vertices xiL and xiR are replaced with a single vertex xi such that xi is
adjacent to any vertex to which xiL and xiR were originally adjacent) and
deleting all loops.
Choose the minimum k′ such that some
colouring c′\mathchar58V(G)↦{1,…,k′} is a 2-distance colouring of G.
Let c′′\mathchar58\mboxVar(π)L∪\mboxVar(π)R↦{1,…,k′} be the colouring
of \mboxAG(π) defined by c′′(xiL)=c′′(xiR)=c′(xi) for all
i∈{1,…,n}.
Note that for any distinct j1,j2∈{1,…,n}, (xiL,xj1R)∈E(\mboxAG(π)) and (xiL,xj2R)∈E(\mboxAG(π))
(resp. (xj1L,xiR)∈E(\mboxAG(π)) and (xj2L,xiR)∈E(\mboxAG(π))) together imply
that dG(xj1,xj2)≤d\mboxAG(π)(xj1R,xj2R)≤2 (resp. dG(xj1,xj2)≤d\mboxAG(π)(xj1L,xj2L)≤2); hence c′′ satisfies Conditions 1 and 2 with c′′ in place
of c, and therefore k≤k′. Furthermore, Δ(G) is equal to the maximum, over all
i∈{1,…,n},
of the number of substrings of π of the shape xjxi or xixj′ (where j=i and j′=i); this is bounded above by 2m because every variable of
π occurs exactly m times. Thus by Lemma K.1,
k≤k′≤4m2+1.
Fix distinct letters a1,…,ak∈Σ and any strictly increasing sequence
2<p1<…<pn of positive integers. For each i∈{1,…,n}, fix some ξi∈{1,…,k} such that ξi=c(xi). Let φ\mathchar58X↦Σ∗ be the substitution
defined by φ(xi)=ac(xi)aξipiac(xi) for all
i∈{1,…,n} and φ(x′)=ε for all x′∈X∖\mboxVar(π).
Set w\mathchar58=φ(π). Thus if π=xl1xl2…xln′,
[TABLE]
Let τ be any succinct pattern in \mbox{QR\Pi}^{z}_{\infty,m,cf} such that
w∈L(τ) and τ≺π. It will be argued that
L(τ)=L(π). Suppose ψ\mathchar58X∗↦Σ∗ is a morphism witnessing
w∈L(τ). Let I1,…,In′ be the closed intervals corresponding
to the positions of the subwords of w marked with braces in (15).
We show that for each j∈{1,…,n′}, there is some j′∈{1,…,∣τ∣}
such that Ij⊆Iψ,τ(j′), i.e. there is a single position
of τ that is mapped under ψ to a subword of w covering
ac(xlj)aξljpljac(xlj).
Assume otherwise; let i0∈{1,…,n′} be the least integer for
which the latter statement is false.
It follows that Ii0 contains a cut-point of w relative to (ψ,τ).
Further, one observes that Iψ,τ cannot map any single position
of τ to a proper superset of Ii for any given i∈{1,…,n′}:
Claim K.2
Fix any x∈\mboxVar(τ). For all i∈{1,…,n} and
j∈{1,…,k}, neither ajac(xi)aξipiac(xi) nor ac(xi)aξipiac(xi)aj is a subword of ψ(x).
Proof of Claim K.2.
Suppose, by way of contradiction, that ajac(xi)aξipiac(xi) were a subword of ψ(x) for some x∈\mboxVar(τ).
Since ac(xi)=aξi (by the choice of ξi),
pi≥3 and pi′=pj′ for all distinct i′,j′∈{1,…,n},
there are exactly m (non-overlapping) occurrences of the word
ac(xi)aξipiac(xi) in w. Suppose
these occurrences are represented by the intervals Ij1,…,Ijm
of positions of w, where j1<…<jm. Hence if x occupies positions
q1,…,qm of τ, where q1<…<qm, then Ijℓ⊂Iψ,τ(qℓ) for all ℓ∈{1,…,m}.
As aj occupies the position just before the leftmost point of Ijℓ in
w for all ℓ∈{1,…,m}, x cannot be the first symbol of τ.
Thus there is some xj′∈\mboxVar(τ)
with j′=i such that j=c(xj′), which means that
ac(xj′)ac(xi)aξipiac(xi)
occurs exactly m times in w. Now there cannot be exactly
m occurrences of the substring xj′x in τ; otherwise, the subpattern
obtained from τ by deleting all occurrences of xj′ would be equivalent
to τ, contradicting the succinctness of τ.
Therefore there must be some xj′′∈\mboxVar(τ) (possibly equal to x) with
j′′=j′ such that xj′′x is a substring of τ,
and so ac(xj′′)ac(xi)aξipiac(xi) must be a subword of w. However, by the choice of c – in particular,
Condition 2, c(xj′)=c(xj′′) and thus ac(xj′)ac(xi)aξipiac(xi) cannot occur exactly m times in w, a contradiction.
An analogous proof shows that ac(xi)aξipiac(xi)aj cannot be a subword of ψ(x) for any given
j∈{1,…,k}. (Claim K.2)
By Claim K.2 and the choice of i0 (which
implies, in particular, that Ii0 contains a cut-point),
Iψ,τ(⋃ℓ≤i0Iℓ)≥i0+1.
By applying Claim K.2 successively to
w(Ii0),w(Ii0+1),…,w(In′), it follows that for j=i0+1,i0+2,…,n′,
Iψ,τ(⋃ℓ≤jIℓ)≥j+1
and so ∣τ∣≥n′+1, implying that τ≺π, contrary
to assumption.
Consequently, for each j∈{1,…,n′}, there is some j′∈{1,…,∣τ∣}
such that Ij⊆Iψ,τ(j′); by Claim K.2,
one also has Iψ,τ(j′)⊆Ij. Thus, since
ac(xli)aℓlipliac(xli) occurs
exactly m times in w for all i∈{1,…,n′} and
the subword of w corresponding to the interval Ii′ is different from that
corresponding to Ii′′ whenever li′=li′′, one has (after normalising τ and π)
π⊑τ. As ∣τ∣≤∣π∣, it follows that
L(τ)=L(π), as required.
Remark K.3
The notion of the adjacency graph of a (constant-free) pattern was
introduced in the study of pattern avoidance [26, Chapter 3].
We do not know whether the lower bound on ∣Σ∣ in Theorem 11
is tight. The minimum number of colours needed to satisfy Conditions
1 and 2 in the proof of Theorem 11 might be smaller
than 4m2+1; if so, this would give a reduction in the minimum alphabet
size needed for the theorem to hold. In fact, the upper bound on k
in the proof of Theorem 11 would still hold if the second condition on c
is weakened as follows:
if there are distinct j1,j2∈{1,…,n} such that
(i) (xiL,xj1R)∈E(\mboxAG(π)) and (xiL,xj2R)∈E(\mboxAG(π)),
(resp. (xj1L,xiR)∈E(\mboxAG(π)) and (xj2L,xiR)∈E(\mboxAG(π))),
then xiL (resp. xiR) is adjacent to at least two vertices that are assigned different
colours.
Proof. We first note that over a unary alphabet Σ={0}, any pattern of the shape
0kx1m…xnm, where k≥0 and n≥1, is equivalent to 0kxm.
Given any patterns π and π′ of the shape 0kxm or 0k, define
π≺π′ iff
π′ is a constant pattern and π contains at least one variable, or
2. 2.
both π and π′ are non-constant patterns and ∣π(ε)∣<∣π′(ε)∣.
For any constant pattern π, a teaching set for π w.r.t. (\mbox{QR\Pi}^{1}_{\infty,m},\prec)
is {(π,+)}: π is preferred to all non-constant patterns while any constant pattern
different from π cannot be consistent with (π,+).
For any pattern τ\mathchar58=0kxm, where k≥0, a teaching set for τ
w.r.t. (\mbox{QR\Pi}^{1}_{\infty,m},\prec) is {(0k,+),(0k+m,+)}:
no constant pattern can be consistent with this sample; furthermore, since the constant
part of any pattern consistent with this sample has length at most ∣τ∣, it follows
from Condition 2 above that τ is preferred to all τ′ such that τ′ is consistent with
the sample and L(τ′)=L(τ).
To see that \mbox{PBTD}(\mbox{QR\Pi}^{z}_{\infty,m})\geq 2 for all z≥1, one may apply [17, Theorem 34];
according to this theorem, \mbox{PBTD}(\mbox{QR\Pi}^{1}_{\infty,m})>1 because \mbox{QR\Pi}^{1}_{\infty,m} contains all constant patterns
as well as infinitely many patterns that generate infinite languages.
Proof.Assertion (i).
Assume, by way of contradiction, that there is a least j0 such that
h(xj0) does not satisfy the claim. It will be shown by induction that for every variable x of
π that does not lie to the left of xj0nj0, h(x) ends with [math]; since w ends with
1, this would contradict the fact that h(π)=w.
By the choice of xj0, h(xj0) has one of the following shapes: (1) 0p for some p∈{1,…,ℓ},
(2) 0p′1…10p′′1 for some p′,p′′∈{1,…,ℓ} with p′′>p′, or (3) 0p′′′1…10p′′′′ for some
p′′′,p′′′′∈{1,…,ℓ}. If h(xj0) has the shape given in (2), then, since xj0 occurs at least
twice in π, w must contain a subword of the shape 0p′′10p′1 for some p′,p′′∈{1,…,ℓ}
with p′<p′′, which is impossible (as seen from the shape of w in Equation (2)).
Hence (1) or (3) holds, so the induction statement (i.e. that for every variable x of
π that does not lie to the left of xj0nj0, h(x) ends with [math]) holds for x=xj0.
Now consider any variable x of π that lies to the right of xj0nj0.
By the induction hypothesis, it may be assumed that for every variable x′ of π lying to the right of xj0nj0
and to the left of x, h(x′) ends with [math]. If h(x) starts with 1, then, since x is repeated at least
once in π and every occurrence of 1 in w is preceded by [math], h(x) must end with [math].
Suppose h(x) starts with [math] and ends with 1. If x′ is the variable immediately preceding x in π, then
by the induction hypothesis, h(x′) is of the shape α0 for some α∈{0,1}∗; thus, since x occurs
at least twice in π,
if π′ denotes the suffix of π starting at the first occurrence of x, then h(π′) is of the shape 0p010p11β for some p0,p1∈{1,…,ℓ} with p1>p0 and some β∈{0,1}∗.
As p1>p0, h(x) cannot be equal to 0p01, and therefore h(x) must be of the shape
0p01…0p21 for some p2∈{1,…,ℓ} with p2>p0. But w does not contain
any subword of the shape 0p210p01 with p2∈{1,…,ℓ} and p0<p2.
The latter contradiction implies that if h(x) starts with [math], then it must also end with [math]. This completes
the induction step and establishes the claim.
Assertion (ii).
It suffices to show that for all j∈{0,…,k}, there is some
j′∈{1,…,ℓ} such that Ih,π(Jj)⊆Ij′. By Assertion (i), there are j′′∈{1,…,ℓ}
and i′′∈{1,…,ij′′} such that h(xjnj)=(0j′′1)i′′. Furthermore, if j≥1, then
h(xj−1nj−1) ends with 1. One observes from Equation (2)
that any occurrence of 0j′′1 in w that starts after an occurrence of 1 or
is a prefix of w must belong to the interval Ij′′. Consequently, Ih,π(Jj)⊆Ij′′, as was to be
shown.
Proof. If m=1, then \mboxNCΠ∞,mz contains only the pattern x1 (up to equivalence) and thus
\mboxTD(\mboxNCΠ∞,1z)=\mboxPBTD(\mboxNCΠ∞,1z)=0. Suppose m≥2.
Assertion (i).
Suppose Σ={0}. We identify every pattern language L(π) such that π=x0n0…xknk
with its Parikh image {v⋅x\mathchar58x∈N0k+1}, where v=(n0,…,nk).
Thus teaching \mboxNCΠ∞,mz is equivalent to teaching the class Cm\mathchar58={{v⋅x\mathchar58x∈N0k}\mathchar58v∈{1,…,m}k∧k∈N}. Since the PBTD is a lower bound for the TD,
it suffices to show that \mboxTD(L,Cm)=O(m) for all L∈Cm and \mboxPBTD(Cm)=Ω(m).
Let L={v⋅x\mathchar58x∈N0k+1},
where v=(n0,…,nk)∈{1,…,m}k+1;
without loss of generality, it may be assumed that for all distinct i and j, ni does not divide nj
(otherwise, if ni∣nj, then the linear set L′ obtained from L by deleting the entry nj from v
in the definition of L would be equal to L). It is shown that L can be taught w.r.t. Cm using at most m
examples. Let T be the sample consisting of all pairs (p,ℓp) such that p≤m and ℓp=+
if p∈L and ℓp=− if p∈/L (that is, T consists of all examples for L in the
domain {0,1,2,…,m}). Consider any H∈Cm that is consistent with
T. Since {n0,…,nk}⊆L, the linearity of
H (resp. L) implies that L⊆H. Furthermore, pick {n0′,…,nk′′}⊆{1,…,m} so that H is equal to {w⋅x\mathchar58x∈N0k′+1}
for w=(n0′,…,nk′′).
The consistency of H with T implies that {n0′,…,nk′′}⊆{n0,…,nk}
and hence H⊆L. Therefore H=L and so T is indeed a teaching set
for L w.r.t. Cm.
Now it is shown that \mboxPBTD(Cm)=Ω(m).
We reuse the construction in the proof
of [18, Lemma 29]. Assume that m≥6, and set m′=⌊3m⌋.
Let F be the class {⟨{m′}∪{pi\mathchar581≤i≤m′−1}⟩\mathchar58(∀i∈{1,…,m′−1})[pi∈{m′+i,2m′+i}]}.
Note that F⊆Cm. Furthermore, every member of F is of the shape {0,m′}∪{pi\mathchar581≤i≤m′−1}∪{x\mathchar58x≥2m′}, where pi∈{m′+i,2m′+i} for all i∈{1,…,m′−1}.
Thus the TD of every member of F is at least m′−1, and therefore \mboxPBTD(Cm)≥\mboxPBTD(F)≥m′−1. This establishes that \mboxTD(Cm)=Θ(m) and \mboxPBTD(Cm)=Θ(m).
Assertion (ii).
Suppose {0,1}⊆Σ. We first show that \mboxPBTD(\mboxNCΠ∞,mz)=1.
Let ≺ be the preference relation on \mboxNCΠ∞,mz
defined according to the following hierarchy, in order of decreasing priority. Suppose
π and τ are non-cross patterns in canonical form belonging to \mboxNCΠ∞,mz.
(Here “prefer π to τ” means τ≺π.)
Rule 1:
With highest priority: prefer π to τ if L(π)=L(x0) and L(τ)=L(x0).
Rule 2:
With second highest priority: suppose both π and τ contain at least two distinct variables; prefer π to τ if π has fewer variables than τ.
Rule 3:
With third highest priority: prefer π to τ if L(π)⊂L(τ).
Suppose π=x0n0…xknk,
where n0,…,nk∈N. If there is some i with ni=1,
then π has the teaching set {(0,+)} w.r.t. \mboxNCΠ∞,mz.
Suppose now that ni≥2 for all i. Let T={(w1,+)}, where
[TABLE]
Let τ\mathchar58=y0m0…yℓmℓ denote any
pattern in \mboxNCΠ∞,mz that is consistent with T and τ≺π.
By Rule 1, mi≥2 for all i∈{0,…,ℓ}, that is, L(τ)=L(x0).
By Lemma 13, the consistency of τ with (w1,+) implies that
τ is equivalent to x0 or every variable of τ occurs at least twice and for each
j∈{0,…,k}, there are nonnegative integers sj,0,…,sj,lj and
ij,0,ij,1,…,ij,lj∈{0,…,ℓ} with
ij,h<ij′,h′ whenever j<j′ or j=j′∧h<h′ such that
∑h=0ljsj,hmij,h=nj. In particular, since L(τ)=L(x0),
τ contains at least k+1 variables. By Rule 2, τ must contain exactly k+1
variables.
It follows that τ is equivalent to x0n0′x1n1′…xknk′, where,
for each i∈{0,…,k}, ni′∣ni. If there were a least i′∈{0,…,k} such
that ni′<ni (that is, ni′ properly divides ni), then L(π)⊂L(τ) and
so τ≺π by Rule 3, contradicting the choice of τ. Thus ni′=ni for all i∈{0,…,k}
and therefore L(τ)=L(π), as required.
Next, it will be shown that \mboxTD(\mboxNCΠ∞,mz) is at most 2 plus the number of prime powers
(including primes) less than m; this is equal to 2+∑i=1⌊log(m−1)⌋ϱ((m−1)i1),
where ϱ(x) denotes the number of primes less than or equal to x.
As observed earlier, the pattern x0 can be taught with the single
example (0,+). Suppose π=x0n0…xknk,
where ni≥2 for all i∈{0,…,k}. We build a teaching set T consisting of the following
examples; η\mathchar58=y0m0…yℓmℓ will denote any non-cross pattern
(in canonical form) in \mboxNCΠ∞,mz that is consistent
with T. First, put (v1,+) into T, where
[TABLE]
According to Lemma 13, the consistency of η with (v1,+)
implies that for each j∈{0,…,k}, there are nonnegative integers sj,0,…,sj,lj
and ij,0,…,ij,lj∈{0,…,ℓ} such that ij,h<ij′,h′
iff j<j′ or j=j′∧h<h′, and ∑r=0ljsj,rmij,r=nj.
Second, define
[TABLE]
and put (v2,−) into T. Note that by Lemma 13, v2 is indeed
a negative example for π because any pattern π′ with v2∈L(π′) is equivalent
to x0 or it contains at least k+2 variables that occur at least twice. Furthermore, Lemma
13 also implies that η is not equivalent to x0 and
that η contains at most k+1 variables. Since the consistency of η with (v1,+)
implies that η contains at least k+1 distinct variables, it follows that η contains exactly
k+1 variables, each of which occurs at least twice. That is to say, η is of the shape
x0n0′x1n1′…xknk′, where, for each i∈{0,…,k}, ni′∣ni.
It remains to ensure that ni′ does not properly divide ni for any i∈{0,…,k}.
Let {q0r0,…,qℓ′rℓ′} be the set of all prime powers that are maximal proper prime power factors of the ni’s; in other words, for every j∈{0,…,ℓ′},
there is some j0∈{0,…,k} with qjrj∣nj0 and qjrj=nj0 but
qjrj+1∤nj0. For each j∈{0,…,ℓ′}, let dj be the number of i’s between
[math] and k (inclusive) such that qjrj does not divide ni, and set ej=qjrj−1⋅∏p\mboxisprime∧qj=p≤mp⌊log(p)log(m)⌋.
Now define
[TABLE]
for every j∈{0,…,ℓ′}, and put (tj,−) into T.
We first show that tj∈/L(π) for every j∈{0,…,ℓ′}. This will be achieved by means of
a proof by contradiction; assuming that tj∈L(π), one can construct
a one-one mapping F from {1,…,dj+1} to {i∈{0,…,k}\mathchar58qjrj∤ni} as follows.
Given any i∈{1,…,dj+1}, it follows from
Lemma 13 that there are nonnegative integers si,0,…,si,li and ui,0,…,ui,li∈{0,…,k} such that ui,g<ui′,g′ iff i<i′
or i=i′ and g<g′, and ∑k=0lisi,knui,k=ej.
Note that since qjrj∤ej, there must exist a least hi
such that qjrj∤nui,hi. Define F(i)=ui,hi.
Then range(F)⊆{i∈{0,…,k}\mathchar58qjrj∤ni}; furthermore,
i<i′⇒ui,hi<ui′,hi′⇔F(i)<F(i′).
Thus F is indeed a one-one mapping, so that
[TABLE]
a contradiction.
To complete the proof, it will be shown that if there were a least i′′∈{0,…,k} such
that ni′′′ properly divides ni′′ (as noted above, η is of the shape
x0n0′x1n1′…xknk′, where ni′∣ni for all i∈{0,…,k}),
then there would be a least j′∈{0,…,ℓ′} such that tj′∈L(η).
Suppose such an i′′ did exist. Then there must be a least j′′∈{0,…,ℓ′}
for which qj′′rj′′∣ni′′ and ni′′′∣ni′′qj′′−1. Hence
the number of i’s between [math] and k (inclusive) such that qj′′rj′′∤ni′
is at least 1 more than the number of i’s between [math] and k (inclusive) such that
qj′′rj′′∤ni, and the number of j1’s between [math] and k inclusive
such that nj1′∣ej′′ is at least dj′′+1.
Consequently, tj′′∈L(η), which is the desired contradiction.
In conclusion, ni′=ni for all i∈{0,…,k} and thus η is equivalent to π;
this establishes that T is a teaching set for π w.r.t. \mboxNCΠ∞,mz.
To prove that \mboxTD(\mboxNCΠ∞,mz)≥max({ω(n)\mathchar58n≤m}), pick any n≤m
such that ω(n)≥ω(n′) for all n′≤m. Let q1,…,qω(n) be all the
prime factors of n, and consider the non-cross pattern θ\mathchar58=x1∏i=1ω(n)pi.
For each i∈ω(n), set θi\mathchar58=x1∏j=ipj.
We note that θ∈\mboxNCΠ∞,mz and for all i∈{1,…,ω(n)}, θi∈\mboxNCΠ∞,mz. Furthermore, whenever i=j, L(θi)∩L(θj)⊆L(θ).
It follows that \mboxTD(θ,\mboxNCΠ∞,mz)≥ω(n).
Suppose {0,1}⊆Σ. Let π=x14x28x39.
There are 3 maximal proper prime power factors of 4,8 and 9, namely,
2,4 and 3, and so by the proof of Theorem 14,
the TD of π w.r.t. \mboxNCΠ∞,9∣Σ∣ is at most
2+3=5. However, one can build a teaching set T of size 4 for π as follows.
As in the proof of Theorem 14(ii), put (v1,+) and (v2,−) into T, where
v1\mathchar58=(01)4(001)8(0001)9
and v2\mathchar58=(01)9!(001)9!(0001)9!(00001)9!.
Arguing as in the proof of Theorem 14(ii), any pattern τ∈\mboxNCΠ∞,9∣Σ∣ that is consistent with both (v1,+) and (v2,−) must be of the
shape x1k1x2k2x3k3, where k1∣4,k2∣8 and k3∣9.
Thus at this stage, it suffices to distinguish π from the three patterns x12x28x39,
x14x24x39 and x14x28x33. Put (v3,−) into T, where
[TABLE]
To see that v3∈/L(π), assume, by way of contradiction, that some morphism ψ\mathchar58X∗↦Σ∗
satisfies ψ(π)=v3. Since v3 is not a 4-th, 8-th or 9-th power, at least two variables of π are not
mapped to the empty word by ψ.
First, suppose ψ(x1)=ε. Then ψ(x14) must be equal to
either 08 or 04. If ψ(x14)=08, then, since (106)803 is not a 9-th power,
ψ(x28)=(106)8 and therefore ψ(x39)=03, which is impossible.
The argument for the case ψ(x14)=04 is similar.
Second, suppose ψ(x1)=ε. Then ψ(x28)=ε and ψ(x39)=ε.
Hence ψ(x28)=08 and so ψ(x39)=(106)803, which is impossible.
On the other hand, v3∈L(x12x28x39)∩L(x14x28x33).
Thus it only remains to distinguish π from x14x24x39, and this may be done
with a single negative example, say v4\mathchar58=(01)4(001)4(0001)9.
Establishing the exact TD of any given pattern in \mboxNCΠ∞,mz (for any fixed finite z≥2
and m≥2) seems to be quite difficult in general. We highlight a potential difficulty faced when one tries to
apply a natural method to determine a lower bound on the TD of such a pattern.
Suppose π\mathchar58=x1n1…xknk, where n1,…,nk≥2 and k≥1. For each maximal
proper prime power factor qr of ni, let πi,q be the pattern derived from π by replacing xini
with xiq, and let P be the finite class of all patterns so obtained. For the sake of convenience, assume
the variables of patterns in P are renamed so that for all Pi,Pj∈P with i=j, \mboxVar(Pi)∩\mboxVar(Pj)=∅ and \mboxVar(Pi)∩{x1,…,xk}=∅. For each partition P of P and every member {P1,…,Pd} of P,
let y1,…,yℓ′ be all the variables occurring in P1,…,Pd. Then π is distinguishable from
{P1,…,Pd} with a single negative example iff the sentence
[TABLE]
holds. As implied by the work of Karhumäki et al. [23], there is a
word equation E with variables in \mboxVar(P1)∪\mboxVar(π)∪{z1,…,zℓ′′} (for some additional variables z1,…,zℓ′′) such that the
inequation P1(y1,…,yℓ′)=π(x1,…,xk) is equivalent to (∃z1,…,zℓ′′)E. Consequently,
(16) is equivalent to a sentence whose prenex normal form has quantifier prefix
∃∀∃ (call this an ∃∀∃-sentence; a ∀∃-sentence
is defined analogously) over a conjunction of word equations. If a decidability procedure exists for all such ∃∀∃-sentences, then one could decide whether or not π is distinguishable from
{P1,…,Pd} with a single example. More generally, one
could find a largest number f≤∣P∣ such that for all partitions P of P of size f′<f (that is, P has exactly f′ members), there is a member {P1,…,Pd} of P
from which π is not distinguishable with a single example. Then f would be a lower bound on the teaching
dimension of π w.r.t. \mboxNCΠ∞,mz. However, this method does not seem feasible because
the set of all ∀∃-sentences over positive word equations (combinations of word equations
using ∧ or ∨) is already undecidable [15].
Proof. We first compute \mboxPBTD(Π∞). It will be assumed that every pattern π in the present proof is succinct, i.e. ∣π′∣≥∣π∣
for all π′ such that L(π′)=L(π). Define a preference relation ≺ on Πz based
on the following preference hierarchy, where π and τ are any two given succinct patterns:
Rule 1:
With highest priority, prefer π to τ (i.e. τ≺π) if ∣π(ε)∣>∣τ(ε)∣.
Rule 2:
With second highest priority, prefer π to τ (i.e. τ≺π) if L(π)⊆L(τ).
Given any π∈Π∞, one can construct a teaching set T of size at most 2 for π w.r.t. (Πz,≺) as follows; τ will denote any pattern in Πz that is consistent with T and
τ≺π. First, put
(π(ε),+) into T. Since τ(ε)⊑π(ε), Rule 1 will ensure that τ(ε)=π(ε),
that is, π and τ have identical constant parts. Second, suppose \mboxVar(π)={x0,…,xk−1}. Choose
a set {a0,…,ak−1} of k distinct letters such that {a0,…,ak−1}∩\mboxConst(π)=∅,
and put (π[xi→ai,0≤i≤k−1],+) into T.
By Theorem 17, the fact that π[xi→ai,0≤i≤k−1]∈L(τ)
implies L(π)⊆L(τ). By Rule 2, one has L(τ)=L(π), as required.
To see that \mboxPBTD(Π∞)≥2, one may apply [17, Theorem 34]; according to this theorem,
\mboxPBTD(Π∞)>1 because Π∞ contains all constant patterns as well as infinitely many patterns that
generate infinite languages.
Next, it is shown that \mboxPBTD(Π∞,m1)=Θ(m).
Suppose Σ={0}. It follows from Theorem 14
and the monotonicity of the PBTD [17, Lemma 6] that
\mboxPBTD(Π∞,m1)≥\mboxPBTD(\mboxNCΠ∞,m1)=Θ(m).
For the upper bound, we observe that every pattern in Π∞,m1 is equivalent
to a pattern of the shape 0kx1n1…,xℓnℓ, where k+ℓ≥1
and 1≤n1<…<nℓ≤m (this follows from the fact that over unary alphabets,
equivalence of two patterns is preserved under permutations of the patterns’ symbols and that
any two terms of the shape xinxjn can be combined into a single term xin).
Define the preference relation ≺ on Π∞,m1 as follows:
for any π,π′∈Π∞,m1, π≺π′ iff
•
∣π′(ε)∣>∣π(ε)∣, or
•
π′(ε)=π(ε) and L(π′)⊂L(π).
Suppose π∈Π∞,m1. If π=0k for some k≥1,
then π can be taught w.r.t. (Π∞,m1,≺) using the single positive
example (0k,+) since all patterns containing 0k must have a constant part of length
at least k=∣π∣ and π is preferred to all patterns with a constant part of
length less than k. Suppose π=0kx1n1…xℓnℓ for
some ℓ≥1 such that ni<nj whenever i<j.
A teaching set for π is T\mathchar58={(0k,+)}∪{(0k+ni,+)\mathchar58i∈{1,…,ℓ}}.
Let τ denote any pattern in Π∞,m1 that is consistent with T.
The positive example (0k,+) ensures that τ(ε)=π(ε).
Furthermore, since 0k+ni∈L(τ) for all i∈{1,…,ℓ},
it follows that L(τ)⊆L(π), and so by the definition of ≺,
L(τ)=L(π).
Proof. The “if” direction of the lemma follows from Condition (ii), Theorem 17
and the fact that L(π)⊆L(τ) (which is in turn implied by τ∈π\shuffleY∗ and Y∩\mboxVar(τ)=∅). We prove the “only if” direction of the
lemma.
Condition (i):
Assume, by way of contradiction, that Y′δ were a prefix of τ.
Fix some ω∈Σ∖\mboxConst(π) and y∈\mboxVar(Y′). Set w=τ[y→ω].
Then w∈L(τ) by construction; on the other hand, since τ starts with δ
but w starts with ω=δ, w∈/L(π). The proofs that δY′ is not
a suffix of τ and δY′δ′ is not a substring of τ are similar.
Condition (ii):
Note that Condition (i) implies π is similar to τ. If L(τ)⊆L(π),
then (a) ∣Σ∣=∞, (b) π is similar to τ and (c) the second part of Theorem
17 together imply Condition (ii).
Condition (iii):
Let h\mathchar58(X∪Σ)∗↦(X∪Y∪Σ)∗ be any
constant-preserving morphism such that h(π)=τ. Let p1,p2,…,pn
be all the positions of π that are occupied by variables, where p1<p2<…<pn,
and for all j∈{1,…,n}, let xij denote the variable at the pjth position of
π.
(For example, if π=x10x10x2x1x3x3, then i1=i2=1,i3=2,i4=1 and
i5=i6=3.)
Suppose there is a least j∈{1,…,n} such that h(xij) is not of the shape
Y1xijY2, where Y1,Y2∈Y∗. Note that \mboxConst(h(x))=∅ for all
x∈\mboxVar(π), for otherwise h(π) would have more occurrences of constants than τ
(by Condition (ii), τ is similar to π). Since xi1,xi2,…,xin occur in τ in the same order as their appearance in π,
there exists some j1∈{1,…,n} such that j1≥j and h(xij1)∈Y∗. Now let π′ be
the pattern obtained from π by
deleting all occurrences of xij1.
Let h′\mathchar58(X∪Σ)∗↦(X∪Σ)∗ be a constant-preserving morphism such that h^{\prime}(x)=(h(x)){\big{|}}_{\Sigma\cup\mbox{Var}(\pi)} for all
x∈\mboxVar(π). Then one has
[TABLE]
Consequently, by Theorem 17, L(π)⊆L(π′).
By construction, L(π′)⊆L(π) and so L(π)=L(π′). But π′
is a pattern shorter than π that generates the same language as π, contrary to
the hypothesis that π is succinct.
There are Y′∈Y+ and δ,δ′∈Σ such that
at least one of the following holds: (i) Y′δ is a prefix of τ,
(ii) δY′ is a suffix of τ or (iii) δY′δ′ is a
substring of τ.
Suppose (i) holds. Pick some y∈\mboxVar(Y′) and let τ′ be the
restriction of τ to Σ∪\mboxVar(π)∪{y}. We
show L(τ′)⊃L(π).
By construction, L(τ′)⊇L(π). Fix some ω∈Σ∖\mboxConst(π), and set w=τ′[y→ω].
Then w∈L(τ′). Further, since π starts with δ but w starts with
ω=δ, one has w∈/L(π).
A similar proof applies if (ii) or (iii) holds.
Case 2:
Not Case 1. Let x1 (resp. xn) be the leftmost
(resp. rightmost) variable of π.
For each x∈\mboxVar(π), let Yℓx (resp. Yrx) be the longest
substring Z in Y∗ such that every occurrence of x in τ is immediately
preceded (resp. succeeded) by Z. For each occurrence of x∈\mboxVar(π),
identify the unique y∈Y such that y immediately precedes the
corresponding occurrence of Yℓxx, and put y into Sx (if no such
y exists, then nothing needs to be done). Similarly, for each occurrence
of x∈\mboxVar(π), identify the unique z∈Y such that z immediately
succeeds the corresponding occurrence of xYrx, and put z into
Sx (again, nothing needs to be done if no such z exists).
Further, if the last (resp. first) symbol occurring in τ is some y∈Y,
put y into Sxn (resp. Sx1). Lastly, for every substring of τ of the shape
xY′δ (resp. δY′x), where δ∈Σ,Y′∈Y+ and x∈\mboxVar(π),
put the last (resp. first) symbol of Y′ into Sx.
Let τ′ be the restriction of τ to Σ∪\mboxVar(π)∪⋃x∈\mboxVar(π)Sx.
Note that τ′∈Π4mk+∣π∣+2,m∞
and \tau^{\prime}=\tau{\big{|}}_{\Sigma\cup\mbox{Var}(\pi)\cup S} for some finite
S⊂Y.
Suppose there is a constant-preserving morphism g\mathchar58(X∪Σ)∗↦(X∪Σ)∗ such that g(π)=τ′. We show that this implies the existence of a constant-preserving morphism g′\mathchar58(X∪Σ)∗↦(X∪Σ)∗ such that g′(π)=τ. It will then follow that
whenever L(π)⊂L(τ), one has L(π)⊂L(τ′),
as required. By Lemma 18, every occurrence of any y∈Y
in τ′ is contained in a substring of τ′ of the shape xY′ or Y′x
for some Y′∈Y+, and for every x∈\mboxVar(π), there are Zℓx,Zrx∈Y∗ for which Ig,π maps the position px
of the tth occurrence of x in π (for any t\leq\left|\pi{\big{|}}_{x}\right|) to an interval Jpx of positions of τ′ corresponding to an occurrence
of ZℓxxZrx in τ′ such that the position of the tth occurrence
of x in τ′ belongs to Jpx. Suppose τ=ρ1x1⋯xnρn and τ′=ρ1′x1⋯xnρn′, where ρ1,ρ1′,ρn,ρn′∈Y∗.
Our first step is to show Yℓx1=ρ1 and Yrxn=ρn.
So assume, by way of contradiction, that at least one of the following holds:
(i) Yℓx1=ρ1 or (ii) Yrxn=ρn.
Suppose (i) holds.
Since Yℓx1=ρ1, there is a unique y∈Y immediately preceding
the first occurrence of Yℓx1x1 in τ. Note that Z^{x_{1}}_{\ell}=\rho^{\prime}_{1}=\rho_{1}{\big{|}}_{\bigcup_{x\in\mbox{Var}(\pi)}S^{x}}.
Furthermore, there is another substring of τ of the shape sYℓx1x1,
where s∈(Y∖{y})∪\mboxVar(π)∪Σ.
If s∈Y∖{y}, then s∈Sx1 and so Z^{x_{1}}_{\ell}\neq\rho_{1}{\big{|}}_{\bigcup_{x\in\mbox{Var}(\pi)}S^{x}}=\rho^{\prime}_{1}, a contradiction.
If s∈\mboxVar(π)∪Σ, then one has
[TABLE]
which again shows Zℓx1=ρ1′, a contradiction.
A similar proof shows that (ii) contradicts the definition of Yrxn.
The next step is to show that for every substring of τ of the shape
δY′x (resp. xY′δ), where δ∈Σ,Y′∈Y∗
and x∈\mboxVar(π), one has Yℓx=Y′ (resp. Yrx=Y′).
The proof is similar to that in the preceding paragraph.
Suppose there is a substring of τ of the shape δY′x and
Yℓx=Y′. There is a unique y∈Y at the (∣Y′∣−∣Yℓx∣)th position of Y′, and so by the definition of Sx one
has y∈Sx.
Then, as argued in the preceding paragraph, one has Z^{x}_{\ell}\neq Y^{\prime}{\big{|}}_{\bigcup_{x\in\mbox{Var}(\pi)}S^{x}}, and so such a g as described earlier cannot exist. The proof for
substrings of τ of the shape xY′δ is similar.
Thus one may safely assume that (i) Yℓx1=ρ1, (ii) Yrxn=ρn,
and (iii) for all substrings of τ of the shape δY′x (resp. xY′δ),
where δ∈Σ,Y′∈Y∗ and x∈\mboxVar(π), we have Yℓx=Y′
(resp. Yrx=Y′).
We next observe that for any xi∈\mboxVar(π) and y∈⋃x∈\mboxVar(π)Sx,
#(y)[Zℓxi]≤#(y)[Yℓxi]. To see this, suppose first
that there is an occurrence of Yℓxixi that is not immediately preceded
by any y∈Y. Then Z^{x_{i}}_{\ell}=Y^{x_{i}}_{\ell}{\big{|}}_{\bigcup_{x\in\mbox{Var}(\pi)}S^{x}}
and thus for all y∈⋃x∈\mboxVar(π)Sx, #(y)[Zℓxi]≤#(y)[Yℓxi].
Second, suppose that every occurrence of Yℓxi is immediately preceded
by some y∈Y. Thus, by the choice of Yℓxi, there must exist distinct
y′,y′′∈Y such that y′Yℓxixi and y′′Yℓxixi are substrings
of τ. By the definition of Sxi, y′,y′′∈Sxi. Hence both
y^{\prime}Y^{x_{i}}_{\ell}{\big{|}}_{\bigcup_{x\in\mbox{Var}(\pi)}S^{x}}x_{i} and
y^{\prime\prime}Y^{x_{i}}_{\ell}{\big{|}}_{\bigcup_{x\in\mbox{Var}(\pi)}S^{x}}x_{i} are substrings of τ′,
and therefore Zℓxi is a suffix of Y^{x_{i}}_{\ell}{\big{|}}_{\bigcup_{x\in\mbox{Var}(\pi)}S^{x}}.
Consequently, #(y)[Zℓxi]≤#(y)[Yℓxi] for all y∈⋃x∈\mboxVar(π)Sx,
as required.
Similarly, for any xi∈\mboxVar(π) and y∈⋃x∈\mboxVar(π)Sx,
#(y)[Zrxi]≤#(y)[Yrxi].
For every xi∈\mboxVar(π), let αi be the longest suffix
of Yℓxi such that \alpha_{i}{\big{|}}_{\bigcup_{x\in\mbox{Var}(\pi)}S^{x}}=Zℓxi and let βi be the shortest prefix of Yrxi
such that \beta_{i}{\big{|}}_{\bigcup_{x\in\mbox{Var}(\pi)}S^{x}}=Z^{x_{i}}_{r}
(by the remarks in the preceding paragraph, such αi and βi
exist). Set g′(xi)=αixiβi. For example, suppose
Yℓxi=y1y22y3y1y3y4y1, Yrxi=y1y2y3y1y3y2y4,
Zℓxi=y12, Zrxi=y1y2y1 and ⋃x∈\mboxVar(π)Sx={y1,y2}. Then αi=y3y1y3y4y1 and βi=y1y2y3y1.
It remains to verify that g′(π)=τ. By the present case assumption,
every occurrence of any substring Z∈Y∗ of τ is contained in a
substring θ of τ satisfying at least one of the following: (a) θ=Zx1 and θ is a prefix of τ;
(b) θ=xnZ and θ is a suffix of τ; (c) θ=xiZxj,
for some xi,xj∈\mboxVar(π); (d) θ=δZxi for some δ∈Σ
and xi∈\mboxVar(π); (e) θ=xiZδ for some δ∈Σ and xi∈\mboxVar(π).
Thus, since g′(xi)=αixiβi for all xi∈\mboxVar(π), it suffices to
show: (a) α1=ρ1; (b) βn=ρn; (c) if xiZxj is a substring
of τ for some Z∈Y∗, then βiαj=Z; (d) if δZxi is a substring of τ
for some Z∈Y∗ and δ∈Σ, then αi=Z; (e) if xiZδ is a substring of τ
for some Z∈Y∗ and δ∈Σ, then βi=Z.
Assertion (a):
Note that since Yℓx1=ρ1 and Z^{x_{1}}_{\ell}=\rho^{\prime}_{1}=\rho_{1}{\big{|}}_{\bigcup_{x\in\mbox{Var}(\pi)}S^{x}}, we have Z^{x_{1}}_{\ell}=Y^{x_{1}}_{\ell}{\big{|}}_{\bigcup_{x\in\mbox{Var}(\pi)}S^{x}}.
Consequently, α1=Yℓx1=ρ1.
Assertion (b):
An argument similar to that in the proof of Assertion (a)
yields Z^{x_{n}}_{r}=Y^{x_{n}}_{r}{\big{|}}_{\bigcup_{x\in\mbox{Var}(\pi)}S^{x}}.
Furthermore, since, if ρn=ε, Sxn must contain the last
variable occurring in ρn (=Yrxn), the shortest prefix of Yrxn
whose restriction to ⋃x∈\mboxVar(π)Sx equals Zrxn
is Yrxn. Therefore βn=Yrxn=ρn.
Assertion (c):
Suppose that for some xi,xj∈\mboxVar(π) and Z∈Y∗, xiZxj
is a substring of τ. One must show βiαj=Z.
First, suppose Zrxi=ε. Then Z^{x_{j}}_{\ell}=Z{\big{|}}_{x\in\mbox{Var}(\pi)S^{x}}.
Since Zℓxj is a suffix of Y^{x_{j}}_{\ell}{\big{|}}_{x\in\mbox{Var}(\pi)S^{x}}, it follows that
Z{\big{|}}_{x\in\mbox{Var}(\pi)S^{x}} is a suffix of Y^{x_{j}}_{\ell}{\big{|}}_{x\in\mbox{Var}(\pi)S^{x}}. As
Yℓxj is a suffix of Z, one also has that
Y^{x_{j}}_{\ell}{\big{|}}_{x\in\mbox{Var}(\pi)S^{x}} is a suffix of Z{\big{|}}_{x\in\mbox{Var}(\pi)S^{x}},
and therefore Y^{x_{j}}_{\ell}{\big{|}}_{x\in\mbox{Var}(\pi)S^{x}}=Z{\big{|}}_{x\in\mbox{Var}(\pi)S^{x}}.
If Yℓxj=Z, then there is some y∈Y immediately preceding Yℓxj in Z such that
y∈Sxj, implying Z{\big{|}}_{x\in\mbox{Var}(\pi)S^{x}}\neq Y^{x_{j}}_{\ell}{\big{|}}_{x\in\mbox{Var}(\pi)S^{x}}.
Hence Yℓxj=Z. Since αj is the longest suffix of Yℓxj whose restriction to
⋃x∈\mboxVar(π)Sx equals Zℓxj and
[TABLE]
we have αi=Yℓxj.
Furthermore, since βi is the shortest prefix of
Yrxi whose restriction to ⋃x∈\mboxVar(π)Sx
equals Zrxi, one has βi=ε, and so
βiαj=Yℓxj=Z.
Second, suppose Zrxi=ε. Then Yrxi=ε.
Recall that αj is the longest suffix of Yℓxj whose
restriction to ⋃x∈\mboxVar(π)Sx equals Zℓxj,
and that Yℓxj is a suffix of Z. Let Z=γαj, where
γ∈Y∗. Since Zrxi=ε, γ=ε.
In particular, note that γ[∣γ∣]∈⋃x∈\mboxVar(π)Sx due to
the following reasons: if αj=Yℓxj, then γ[∣γ∣]∈Sxj
by the definition of Sxj; if αj were a proper suffix of Yℓxj
and γ[∣γ∣]∈/⋃x∈\mboxVar(π)Sx, then γ[∣γ∣]αj
would be a suffix of Yℓxj longer than αj whose restriction
to ⋃x∈\mboxVar(π)Sx equals Zℓxj.
Thus γ[∣γ∣] is equal to the last symbol of Zrxi; denote this symbol by y.
One has #(y)[αj]=#(y)[Zℓxj] and thus #(y)[γ]=#(y)[Z]−#(y)[αj]=#(y)[Zrxi]+#(y)[Zℓxj]−#(y)[αj]=#(y)[Zrxi].
It follows that γ is the shortest prefix of Yrxi whose restriction
to ⋃x∈\mboxVar(π)Sx equals Zrxi, which means that
γ=βi.
Assertion (d):
Suppose that for some δ∈Σ, Z∈Y∗ and xi∈\mboxVar(π),
δZxi is a substring of τ. One must show αi=Z.
As was proven earlier, Yℓxi=Z. Hence Z^{x_{i}}_{\ell}=Z{\big{|}}_{\bigcup_{x\in\mbox{Var}(\pi)}S^{x}}=Y^{x_{i}}_{\ell}{\big{|}}_{\bigcup_{x\in\mbox{Var}(\pi)}S^{x}}. Since αi is the
longest suffix of Yℓxi with \alpha_{i}{\big{|}}_{\bigcup_{x\in\mbox{Var}(\pi)}S^{x}}=Z^{x_{i}}_{\ell}, one has αi=Yℓxi=Z.
Assertion (e):
Suppose that for some δ∈Σ, Z∈Y∗ and xi∈\mboxVar(π),
xiZδ is a substring of τ. One must show βi=Z. First,
Yrxi=Z was proven earlier. As in the proof of Assertion (d), Z^{x_{i}}_{r}=Z{\big{|}}_{\bigcup_{x\in\mbox{Var}(\pi)}S^{x}}=Y^{x_{i}}_{r}{\big{|}}_{\bigcup_{x\in\mbox{Var}(\pi)}S^{x}}. Furthermore, since Sxi contains
the last symbol of Z, Yrxi is the shortest prefix of Yrxi (=Z) whose restriction
to ⋃x∈\mboxVar(π)Sx equals Zrxi.
Hence βi=Yrxi=Z.
Proof.Assertion (i). Suppose Σ={0}. Then every π∈Π∞,m1
is equivalent to a pattern of the shape 0kx1p1…xnpn,
where k≥0 and 0≤p1<…<pn≤m. The constant part of
π may be taught using the sample {(0k,+)}∪{(0k−i,−)\mathchar581≤i≤min({k,m})}.
Furthermore, for each k, there are at most ∑i=0m(im)=2m many patterns π′ of the shape
0kx1p1′…xℓpℓ′, where 0≤p1′<…<pℓ′≤m.
For each such pattern π′ with L(π′)=L(π), π′ can be
distinguished from π using a word in the symmetric difference of L(π)
and L(π′). It follows that π has a teaching set of size at most 2m+m+1, as required.
Now suppose ∣Σ∣=∞ and Σ∖\mboxConst(π)={a1,a2,a3,…}.
Let k be the number of distinct variables in π.
We build a teaching set T for π w.r.t. Π∞,m∞.
Let τ denote any pattern in Π∞,m∞ that is consistent with T.
Given π=X1c1X2c2…cn−1Xn∈Π∞,m∞,
where X1,X2,…,Xn∈X∗ and c1,c2,…,cn−1∈Σ+,
put all O(2∣π(ε)∣) elements of {(π(ε),+)}∪{(v,−)\mathchar58v⊏π(ε)} into T;
these examples ensure that τ(ε)=π(ε). Next, set w=π[xi→ai\mathchar58xi∈\mboxVar(π)] and put (w,+) into T.
Then w∈L(τ) implies there is a substitution g\mathchar58X↦Σ∗
such that for some S⊆\mboxVar(τ), g(\tau{\big{|}}_{S})=w
and g(x)=ε for all x\in\tau{\big{|}}_{S}. Fix such an S.
Let g′ be a morphism such that g′(ai)=xi for all i; one has
(g^{\prime}\circ g)(\tau{\big{|}}_{S})=\pi, and
so L(\pi)\subseteq L(\tau{\big{|}}_{S})\subseteq L(\tau).
There are at most O((1+∣π∣)∣π∣) patterns τ′ (up to equivalence) such that for
some substitution h\mathchar58X↦Σ∗, h(τ′)=w and h(x)=ε
for all x∈\mboxVar(τ′); note that each such τ′ satisfies
L(π)⊆L(τ′). For each such τ′ with L(τ′)⊃L(π),
pick wτ′∈L(τ′)∖L(π) and put (wτ′,−) into T.
The latter negative examples ensure that L(\pi)=L(\tau{\big{|}}_{S}). Moreover,
since π is succinct and L(\pi)=L(\tau{\big{|}}_{S}), it follows from Lemma
18 that \tau{\big{|}}_{S} is equal to π up to
a renaming of variables. Thus, up to a renaming of variables, τ∈π\shuffleY∗
for some infinite set Y of variables with Y∩\mboxVar(π)=∅.
By Lemma 19, there exists some τ′∈Π4mk+∣π∣+2,m∞ such that \tau^{\prime}=\tau{\big{|}}_{S^{\prime}} for some
finite S′⊆Y, and if L(π)⊂L(τ), then L(π)⊂L(τ′).
For every τ′′∈(Π4mk+∣π∣+2,m∞)∩π\shuffleY∗ with τ′′(ε)=π(ε) and L(τ′′)⊃L(π), pick some
wτ′′∈L(τ′′)∖L(π)
and put (wτ′′,−) into T; there are at most O((D+1)D) many such τ′′ (up to equivalence), where D\mathchar58=(4mk+∣π∣+2)⋅m. These negative examples ensure that
L(τ)⊃L(π). Therefore L(τ)=L(π), which proves that
T is indeed a teaching set of size O((D+1)D) for π w.r.t. Π∞,m∞,
where D\mathchar58=(4mk+∣π∣+2)⋅m.
Assertion (ii). The first part of this assertion follows quite directly from the
proof of a result in [8]. As the latter reference is currently under review, we reproduce the
proof here (with a few minor modifications).
In this proof, Πkz denotes the class of all k-variable patterns.
Let π be a given (k−1)-variable pattern in which every variable occurs at most m times, and suppose
for the sake of a contradiction that π is not simple block-regular but it has a finite teaching set T
w.r.t. Πkz.
Let
[TABLE]
where Y1,Yn∈X∗, Y2,…,Yn−1∈X+, c1,…,cn−1∈Σ+
and I1,…,In−1 are the closed intervals of positions of π corresponding,
respectively, to the particular occurrences of the constant blocks c1,…,cn−1
as marked in Equation (17).
Fix some s>max({∣α∣\mathchar58α∈T+∪T−∪{π}}) and pick a
variable y∈X∖\mboxVar(π). We consider three cases.
Case 1:
There is a least i∈{1,…,n} such that Yi=ε and every
variable in Yi occurs at least twice in π. We will assume that 2≤i≤n−1, as the cases i=1 and i=n can be handled very similarly.
Fix some distinct a,b∈Σ such that both a and b differ from the last
symbol of ci−1 as well as the first symbol of ci.444Such choices of a and b
are possible because ∣Σ∣≥4.
Suppose Yi starts at the pth position of π.
We consider two subcases.
Case 1.1:
For every variable x occurring in Yi, x occurs
in some Yi′ with i′=i.
Let π′ be the pattern derived from π by inserting ys between
the pth and the (p+1)st positions of π; π′∈1Πmz
because no variable of π occurs more than m times.
Note that L(π)⊆L(π′) by construction, and so π′ is
consistent with {(v,+)\mathchar58v∈T+}. Moreover, since ∣w∣>max({∣α∣\mathchar58α∈T−}) for all w∈L(π′)∖L(π), π′ is also
consistent with {(v,−)\mathchar58v∈T−}.
Hence π′ is consistent with T. Furthermore, let w=π′[y→a].
Decompose w as
[TABLE]
where J1,…,Ji,…,Jn−1 are the closed intervals of positions of w corresponding,
respectively, to the particular occurrences of the constant blocks c1,…,ci,…,cn−1 as marked in
Equation (18). Assume, by way of contradiction, that there
exists a substitution h\mathchar58X↦Σ∗ such that h(π)=w. By the choice of a,
Ih,π(Ii−1) cannot be an interval starting or ending between Ji−1 and Ji.
Furthermore, Ih,π(Ii−1) cannot intersect any of the intervals J1,…,Ji−2
because otherwise ∑j=1i−2∣Ih,π(Ij)∣ would be smaller than
∑j=1i−2∣Jj∣, which is impossible. Similarly, Ih,π(Ii−1)
cannot intersect any of the intervals Ji,…,Jn−1. Hence Ih,π(Ii−1)=Ji−1.
An analogous argument shows that Ih,π(Ii)=Ji. It follows that for all
j∈{1,…,n−1}, Ih,π(Ij)=Jj. Thus there is a subsequence Yi′ of Yi such that Yi′=ε and h(Yi′)=as. However, based on Equation (18) and the fact that every variable
of Yi occurs in some Yi′ with i′=i, it can be concluded that h(Yi′)=ε, which contradicts h(Yi′)=as.
Thus w∈L(π′)∖L(π), and so T cannot be a teaching
set for π w.r.t. Πkz.
Case 1.2:
Yi contains at least one variable that
does not occur in any Yj with j=i. Let xj1,…,xjℓ
be all the variables of Yi that do not occur outside Yi,
and let p1,p2,…,pℓ′ be all the positions of π
that are occupied by some xjq with q∈{1,…,ℓ},
where p1<p2<…<pℓ′.
Let π′ be the pattern derived from π by simultaneously inserting
y2s−j+1 between the (pj−1)st
and the pjth positions of π for all j∈{1,…,ℓ′}.
For example, if π=x1x2ax2x3x3bx4 and i=2, then
π′=x1x2ax2y2sx3y2s−1x3bx4.
Note that π′∈1Πmz. By construction, L(π)⊆L(π′) and π′
is consistent with T. Now set β=π′[y→a,xjq→b,1≤q≤ℓ]. We argue that β∈/L(π).
One has that
[TABLE]
By arguing as in Case 1.1, the choice of a,b implies that if
Yi′ is the restriction of Yi to {xj1,…,xjℓ},
then there is a substitution h\mathchar58X↦Σ∗ such that γ=h(Yi′), where γ
is as defined in Equation (19). Note that ∣Yi′∣=ℓ′.
That γ=h(Yi′) will follow from Lemma A.2
and the following claim.
Claim T.1
If γ=h(Yi′), then γ has at least ∣Yi′∣ cuts relative to (h,Yi′).
Proof of Claim T.1.
We first decompose γ as follows:
[TABLE]
Claim T.1 will follow from the fact that for all
j∈{1,…,ℓ′}, Ij contains at least one cut-point.
First, observe that since a2s occurs exactly once as a substring
of γ and every variable of Yi′ occurs at least twice in Yi′,
there cannot exist any q∈{1,…,ℓ} such that
a2s is a substring of h(xjq). Thus I1 must contain
at least one cut-point of γ. Second, for all j∈{2,…,ℓ′},
ba2s−j+1b occurs exactly once as a substring of γ.
Arguing as before, we conclude that Ij contains at least one cut-point of
γ. (Claim T.1)
It follows from Lemma A.2 and
Claim T.1 that γ=h(Yi′)
and therefore β∈/L(π′), as desired.
Case 2:
π contains a substring of the shape ab, where a,b∈Σ
(a and b are not necessarily distinct). Since ∣Σ∣≥4, one can
fix some c∈Σ with c∈/{a,b}.
Let j3 be a position of π such that
π[j3]π[j3+1]=ab. If L(π) had a finite
teaching set T w.r.t. Πz, then one can argue
as in Case 1 that there is a positive s so
large that if π′ is obtained from π by inserting
ys between the j3th and (j3+1)st positions
of π, then π′ would be consistent with T.
On the other hand, let γ be the string derived
from π′ by substituting c for y and ε for every other variable;
note that the number of times the substring ab
occurs in γ is strictly less than the number
of times that ab occurs in π, which implies
γ∈/L(π) and so L(π′)=L(π). Therefore \mboxTD(π,Πz)=∞.
Case 3:
π does not start or end with variables. Suppose π starts with the constant symbol a.
The proof that L(π) has no finite teaching set w.r.t. Πz is very similar to that in Case 2;
the only difference here is that one chooses some
b∈Σ∖{a} and considers π′=ysπ
for some variable y∈/\mboxVar(π) and a sufficiently
large s. In this case, bsπ(ε)∈L(π′)∖L(π), and therefore
L(π′)=L(π). An analogous argument holds if π ends with a
constant symbol.
Next, we prove the second part of the assertion. Suppose π contains a variable x that
occurs ℓ times for some ℓ>m. We build a teaching set T for π w.r.t. 1Πm∞.
First, put the sample {(π(ε),+)}∪{(w′,−)\mathchar58w′⊏π(ε)} into T; this
sample uniquely identifies the constant part of π (i.e. π(ε)).
Second, pick some a∈Σ∖\mboxConst(π) and put (π[x→a],+) into T;
this additional example reduces the version space to all patterns in 1Πm∞∩Π∞,ℓ∞. Since, by Assertion (i), every pattern in Π∞,ℓ∞
has a finite TD, this implies that \mboxTD(π,1Πm∞)≤\mboxTD(π,Π∞,ℓ∞)<∞,
as required. Furthermore, if π is simple block-regular, then it follows from [7, Proposition 4]
that \mboxTD(π,1Πm∞)≤\mboxTD(π,Π∞)<∞.
Proof. Suppose Σ={0,1}. Let {xi,j\mathchar58i,j∈N0} and {yi,j\mathchar58i,j∈N0}
be two disjoint infinite sets of variables.
It suffices to show that π does not possess a finite tell-tale w.r.t. Π∞,4,cf2 (i.e. a finite set S⊆L(π) such that for all τ∈Π∞,4,cf2,
one has S⊆L(τ)⊆L(π)⇒L(τ)=L(π)).
Following the proof in [31], assume, by way of contradiction,
that π has a finite tell-tale {w1,…,wn} for some n≥1.
Without loss of generality, assume that wi=ε for all i∈{1,…,n}.
For each i∈{1,…,n}, there is a substitution σi\mathchar58X↦Σ∗
witnessing σi(π)=wi.
Set σ~i(π)\mathchar58=σi(x1)σi(x2)σi(x3).
We define, for each i∈{1,…,n}, patterns γi,1,γi,2 and γi,3
according to the following case distinction.
Case 1:
There is some δ∈\mboxConst(wi) such that the last occurrence
of δ in σ~i(π) is strictly before the (∣σi(x1)∣+1)-st
position of σ~i(π). Set γi,1=ε.
Let \ell_{1}\mathrel{\mathop{\mathchar 58\relax}}=\left|\tilde{\sigma}_{i}(\pi){\big{|}}_{\{\delta\}}\right|\left(\mbox{resp.~{}}\ell_{2}\mathrel{\mathop{\mathchar 58\relax}}=\left|\tilde{\sigma}_{i}(\pi){\big{|}}_{\{\overline{\delta}\}}\right|\right),
i.e. ℓ1 (resp. ℓ2) is the number of occurrences of δ
(resp. δ) in σ~i(π). Note that the case
assumption implies σi(x2)σi(x3)∈{δ}∗.
Suppose ℓ1=2p1+r1 and ℓ2=2p2+r2 for some p1,p2≥0
and r1,r2∈{0,1}. Let τi be the pattern derived from σ~i(π)
as follows: if p1≥1 (resp. p2≥1), then for all j∈{0,…,p1−1}
(resp. j∈{0,…,p2−1}), substitute xi,j (resp. yi,j) for
the (2j+1)-st and (2j+2)-nd occurrences of δ (resp. δ)
in σ~i(π), and if r1=1 (resp. r2=1), then substitute xi,p1
(resp. yi,p2) for the (2p1+1)-st (resp. (2p2+1)-st) occurrence of δ
(resp. δ) in σ~i(π).
Define γi,2 to be the prefix of τi of length ∣σi(x1)∣ and
define γi,3 to be the suffix of τi of length ∣σi(x2)σi(x3)∣.
Case 2:
Not Case 1. Then for all δ∈\mboxConst(wi), the position of
the last occurrence of δ in σ~i(π) is greater than
∣σ~i(x1)∣.
Let \ell^{\prime}_{1}\mathrel{\mathop{\mathchar 58\relax}}=\left|\tilde{\sigma}_{i}(\pi){\big{|}}_{\{0\}}\right|(\mboxresp.ℓ2′\mathchar58=\left.\left|\tilde{\sigma}_{i}(\pi){\big{|}}_{\{1\}}\right|\right).
Suppose ℓ1′=2q1+s1 and ℓ2′=2q2+s2 for some q1,q2≥0
and s1,s2∈{0,1}. As in Case 1, let τi be the pattern derived from
σ~i(π) as follows: if q1≥1 (resp. q2≥1), then for all
j∈{0,…,q1−1} (resp. j∈{0,…,q2−1}), substitute xi,j (resp. yi,j) for
the (2j+1)-st and (2j+2)-nd occurrences of [math] (resp. 1)
in σ~i(π), and if s1=1 (resp. s2=1), then substitute xi,q1
(resp. xi,q2) for the (2q1+1)-st (resp. (2q2+1)-st) occurrence of [math]
(resp. 1) in σ~i(π).
Define γi,1 to be the prefix of τi of length ∣σi(x1)∣,
define γi,2 to be the substring of τi of length ∣σi(x2)∣
that starts at the (∣σi(x1)∣+1)-st position of τi, and
define γi,3 to be the suffix of τi of length ∣σi(x3)∣.
Set
[TABLE]
In order to derive a contradiction, it will be shown that (a){w1,…,wn}⊆L(τ) and (b) L(τ)⊂L(π).555In [31],
τ is known as a passe-partout for π and {w1,…,wn}.
Proof of (a). For i∈{1,…,n}, let
φi\mathchar58X↦Σ∗ be the morphism defined as follows.
If σi falls into Case 1, let δ be a letter as defined in Case 1
for σi. For all j∈N0, set φi(xi,j)=δ and
φi(yi,j)=δ. For all i′=i and j∈N0,
set φi(xi′,j)=φi(yi′,j)=ε. It may be directly
verified that φi(τ)=wi.
Suppose σi falls into Case 2. For all j∈N0, set φi(xi,j)=0 and
φi(yi,j)=1. For all i′=i and j∈N0,
set φi(xi′,j)=φi(yi′,j)=ε. Then φi(τ)=wi.
Proof of (b). By Theorem 17,
it is enough to show that there is a morphism ψ\mathchar58X∗↦X∗
such that ψ(π)=τ but there does not exist any morphism
θ\mathchar58X∗↦X∗ for which θ(τ)=ψ.
For the first part, define, for each i∈{1,2,3}, the substitution
ψ(xi)=γ1,i…γn,i. It follows that ψ(π)=τ.
For the second part, we first note that by construction, every variable
of τ that occurs exactly twice must belong to \mboxVar(γ1,2…γn,2γ1,3…γn,3). Consequently, for all morphisms θ\mathchar58X∗↦X∗, if θ(τ) contains exactly three variables, each of
which occurs exactly twice, then θ(τ) is equivalent to one of the
following patterns: x1x2x1x2x32, or x1x2x3x1x2x3,
or x12x2x3x2x3. Thus θ(τ) cannot be equivalent to π.
We conclude from (a) and (b) that {w1,…,wn} cannot be a tell-tale
for π w.r.t. Π∞,4,cf2, contrary to assumption.
We prove that T\mathchar58={(ε,+),(021202,+),(0,−),(0120,−),(03,−),((01)2(021)2(031)2(041)2,−)}
is a teaching set for π\mathchar58=x12x22x32 w.r.t. Π∞,32.
Let τ be any pattern
in Π∞,32 that is consistent with T. Since ε∈L(τ), τ does not contain
any constant symbols. The negative examples (0,−) and
(03,−) ensure that every variable of τ occurs exactly twice.
The consistency of τ with (0120,−) then implies that τ is a non-cross
pattern, i.e., τ is equivalent to a pattern of the shape
x12x22…xk2 for some k. Since 021202∈L(τ), k≥3.
Finally, (01)2(021)2(031)2(041)2∈/L(τ) implies that k≤3.
Hence τ is equivalent to x12x22x32.
Bibliography36
The reference list from the paper itself. Each links out to its DOI / PubMed record.
1[1] A. V. Aho. Algorithms for finding patterns in strings. In Jan van Leeuwen, editor, Handbook of Theoretical Computer Science, Vol. A: Algorithms and Complexity, chapter 5, pages 257–300. MIT Press, Oxford, 1990.
2[2] A. Amir and I. Nor. Generalized function matching. J. Disc. Algo., 5(3):514–523, 2007.
3[3] D. Angluin. Finding patterns common to a set of strings. J. Comput. Syst. Sci., 21:46–62, 1980.
4[4] D. Angluin. Inductive inference of formal languages from positive data. Information and Control , 45(2):117–135, 1980.
5[5] D. Angluin, J. Aspnes, S. Eisenstat and A. Kontorovich. On the learnability of shuffle ideals. J. Mach. Learn. Res., 14:1513–1531, 2013.
6[6] B. S. Baker. Parameterized pattern matching: Algorithms and applications. J. Comput. Syst. Sci., 52(1):28–42, 1996.
7[7] F. Bayeh, Z. Gao and S. Zilles. Erasing pattern languages distinguishable by a finite number of strings. In ALT, pages 72–108, 2017.
8[8] F. Bayeh, Z. Gao, and S. Zilles. Erasing pattern languages distinguishable by a finite number of strings, 2018. Manuscript under review.