The teaching complexity of erasing pattern languages with bounded   variable frequency

Ziyuan Gao

arXiv:1905.07737·cs.FL·May 21, 2019

The teaching complexity of erasing pattern languages with bounded variable frequency

Ziyuan Gao

PDF

Open Access

TL;DR

This paper investigates how bounding variable frequency in pattern languages affects the complexity of learning and teaching these patterns, focusing on the minimum number of examples needed for unique identification in different models.

Contribution

It introduces the study of variable frequency bounds in pattern languages and analyzes their impact on teaching complexity in cooperative learning models.

Findings

01

Bounding variable frequency influences the teaching dimension of pattern classes.

02

The paper provides bounds on the number of examples needed for pattern identification.

03

It compares teaching complexity across different models with variable frequency restrictions.

Abstract

Patterns provide a concise, syntactic way of describing a set of strings, but their expressive power comes at a price: a number of fundamental decision problems concerning (erasing) pattern languages, such as the membership problem and inclusion problem, are known to be NP-complete or even undecidable, while the decidability of the equivalence problem is still open; in learning theory, the class of pattern languages is unlearnable in models such as the distribution-free (PAC) framework (if $P / p o l y \neq = NP / p o l y$ ). Much work on the algorithmic learning of pattern languages has thus focussed on interesting subclasses of patterns for which positive learnability results may be achieved. A natural restriction on a pattern is a bound on its variable frequency -- the maximum number $m$ such that some variable occurs exactly $m$ times in the pattern. This paper examines the…

Tables1

Table 1. Table 1: TD and PBTD of various pattern classes. In each entry, m ≥ 1 𝑚 1 m\geq 1 , the universal (resp. existential) quantifier is taken over all patterns belonging to the class in the corresponding row and Π Π \Pi refers to the class in the corresponding row.

	$z = 1$	$2 \leq z < \infty$	$z = \infty$
${SR Π}^{z}$	$TD = 2$ ,	$TD = 2$ ,	$TD = 2$ ,
${SR Π}^{z}$	$PBTD = 1$ (Thm 3)	$PBTD = 1$ (Thm 3)	$PBTD = 1$ (Thm 3)
${QR Π}_{\infty, m}^{z}$	$TD = 3$	$(\forall π) [TD (π, Π) < \infty]$	$(\forall π) [TD (π, Π) < \infty]$
	(Thm 10)	(Thm 10)	(Thm 10)
	$PBTD = 2$ (Prop 12)	$PBTD \geq 2$ (Prop 12)	$PBTD = 2$ (Prop 12,
			Thm 16)
$NC Π_{\infty, m}^{z}$	$TD / PBTD = Θ (m), m \geq 2$ ,	$TD = o (m)$ ,	$TD = o (m)$ ,
	$TD / PBTD = 0, m = 1$	$PBTD = 1, m \geq 2$ ,	$PBTD = 0, m = 1$
	(Thm 14)	$PBTD = 0, m = 1$	(Thm 14)
		(Thm 14)
$Π_{\infty, m}^{z}$	$TD = O (2^{m})$	$(\exists π) [TD (π, Π_{\infty, 4, c f}^{2}) = \infty]$	$(\forall π) [TD (π, Π) < \infty]$
	(Thm 20(i))	(Thm 21)	(Thm 20(i))
	$PBTD = Θ (m)$ (Thm 16)	$PBTD \geq 2$ (Prop 12)	$PBTD = 2$ (Prop 12,
			Thm 16)

Equations92

w \mathchar 58 = δ_{1}^{m_{1} - 1} δ_{2}^{m_{2}} δ_{1} δ_{2}^{m_{2} - 1} δ_{3}^{m_{3}} δ_{2} \dots δ_{i}^{m_{i} - 1} δ_{i + 1}^{m_{i + 1}} δ_{i} \dots δ_{k - 1}^{m_{k - 1} - 1} δ_{k}^{m_{k}} δ_{k - 1} δ_{k}^{m_{k} - 1} .

w \mathchar 58 = δ_{1}^{m_{1} - 1} δ_{2}^{m_{2}} δ_{1} δ_{2}^{m_{2} - 1} δ_{3}^{m_{3}} δ_{2} \dots δ_{i}^{m_{i} - 1} δ_{i + 1}^{m_{i + 1}} δ_{i} \dots δ_{k - 1}^{m_{k - 1} - 1} δ_{k}^{m_{k}} δ_{k - 1} δ_{k}^{m_{k} - 1} .

w \mathchar 58 = I_{1} (01)^{i_{1}} I_{2} (001)^{i_{2}} \dots I_{j} (0^{j} 1)^{i_{j}} \dots I_{ℓ - 1} (0^{ℓ - 1} 1)^{i_{ℓ - 1}} I_{ℓ} (0^{ℓ} 1)^{i_{ℓ}} \in L (π),

w \mathchar 58 = I_{1} (01)^{i_{1}} I_{2} (001)^{i_{2}} \dots I_{j} (0^{j} 1)^{i_{j}} \dots I_{ℓ - 1} (0^{ℓ - 1} 1)^{i_{ℓ - 1}} I_{ℓ} (0^{ℓ} 1)^{i_{ℓ}} \in L (π),

γ = h (x_{1}) 01 I_{1} h (x_{2}) 11 I_{2} h (x_{1}) 01 h (x_{2}) 11 h (x_{1}) 01,

γ = h (x_{1}) 01 I_{1} h (x_{2}) 11 I_{2} h (x_{1}) 01 h (x_{2}) 11 h (x_{1}) 01,

γ = I_{h, π} ([1, 2]) = [1, 4] abbb h (x_{1} x_{2}) \overline{I}_{h, π} (5) = {3} a b h (x_{1}) I_{h, π} ([4, 5]) = [7, 10] bbab h (x_{2} x_{1}) .

γ = I_{h, π} ([1, 2]) = [1, 4] abbb h (x_{1} x_{2}) \overline{I}_{h, π} (5) = {3} a b h (x_{1}) I_{h, π} ([4, 5]) = [7, 10] bbab h (x_{2} x_{1}) .

w_{2} = ψ (x_{1}) a_{i_{1}} ψ (x_{2}) a_{i_{2}} β_{1} \dots a_{i_{j}} ψ (x_{j + 1}) a_{i_{j + 1}} β_{j} \dots a_{i_{n - 2}} ψ (x_{n - 1}) a_{i_{n - 1}} β_{n - 2} ψ (x_{n}) .

w_{2} = ψ (x_{1}) a_{i_{1}} ψ (x_{2}) a_{i_{2}} β_{1} \dots a_{i_{j}} ψ (x_{j + 1}) a_{i_{j + 1}} β_{j} \dots a_{i_{n - 2}} ψ (x_{n - 1}) a_{i_{n - 1}} β_{n - 2} ψ (x_{n}) .

\alpha_{j}=\left\{\begin{array}[]{ll}a_{i_{j+1}}a_{j^{\prime\prime}}a_{j^{\prime}}a_{i_{j}}&\mbox{if $a_{i_{j}}\neq a_{i_{j+1}}$;}\\ a_{j^{\prime}}a_{i_{j}}a_{j^{\prime\prime}}&\mbox{if $a_{i_{j}}=a_{i_{j+1}}$.}\end{array}\right.

\alpha_{j}=\left\{\begin{array}[]{ll}a_{i_{j+1}}a_{j^{\prime\prime}}a_{j^{\prime}}a_{i_{j}}&\mbox{if $a_{i_{j}}\neq a_{i_{j+1}}$;}\\ a_{j^{\prime}}a_{i_{j}}a_{j^{\prime\prime}}&\mbox{if $a_{i_{j}}=a_{i_{j+1}}$.}\end{array}\right.

\alpha_{j}=\left\{\begin{array}[]{ll}a_{i_{j+1}}a_{i_{j}}&\mbox{if $a_{i_{j}}\neq a_{i_{j+1}}$;}\\ a_{i_{j}}&\mbox{if $a_{i_{j}}=a_{i_{j+1}}$.}\end{array}\right.

\alpha_{j}=\left\{\begin{array}[]{ll}a_{i_{j+1}}a_{i_{j}}&\mbox{if $a_{i_{j}}\neq a_{i_{j+1}}$;}\\ a_{i_{j}}&\mbox{if $a_{i_{j}}=a_{i_{j+1}}$.}\end{array}\right.

\alpha_{j}=\left\{\begin{array}[]{ll}\psi(x_{1})a_{i_{j+1}}\psi(x_{1})a_{i_{j}}&\mbox{if $a_{i_{j}}\neq a_{i_{j+1}}$;}\\ \psi(x_{1})a_{i_{j}}&\mbox{if $a_{i_{j}}=a_{i_{j+1}}$.}\end{array}\right.

\alpha_{j}=\left\{\begin{array}[]{ll}\psi(x_{1})a_{i_{j+1}}\psi(x_{1})a_{i_{j}}&\mbox{if $a_{i_{j}}\neq a_{i_{j+1}}$;}\\ \psi(x_{1})a_{i_{j}}&\mbox{if $a_{i_{j}}=a_{i_{j+1}}$.}\end{array}\right.

\alpha_{j}=\left\{\begin{array}[]{ll}a_{i_{j+1}}\psi(x_{n})a_{i_{j}}\psi(x_{n})&\mbox{if $a_{i_{j}}\neq a_{i_{j+1}}$;}\\ a_{i_{j}}\psi(x_{n})&\mbox{if $a_{i_{j}}=a_{i_{j+1}}$.}\end{array}\right.

\alpha_{j}=\left\{\begin{array}[]{ll}a_{i_{j+1}}\psi(x_{n})a_{i_{j}}\psi(x_{n})&\mbox{if $a_{i_{j}}\neq a_{i_{j+1}}$;}\\ a_{i_{j}}\psi(x_{n})&\mbox{if $a_{i_{j}}=a_{i_{j+1}}$.}\end{array}\right.

\alpha_{j}=\left\{\begin{array}[]{ll}a_{i_{j+1}}a_{i_{j}}&\mbox{if $a_{i_{j}}\neq a_{i_{j+1}}$;}\\ a_{i_{j}}&\mbox{if $a_{i_{j}}=a_{i_{j+1}}$.}\end{array}\right.

\alpha_{j}=\left\{\begin{array}[]{ll}a_{i_{j+1}}a_{i_{j}}&\mbox{if $a_{i_{j}}\neq a_{i_{j+1}}$;}\\ a_{i_{j}}&\mbox{if $a_{i_{j}}=a_{i_{j+1}}$.}\end{array}\right.

\alpha_{j}=\left\{\begin{array}[]{ll}\psi(x_{1})a_{i_{j+1}}\psi(x_{1})a_{i_{j}}&\mbox{if $a_{i_{j}}\neq a_{i_{j+1}}$;}\\ \psi(x_{1})a_{i_{j}}&\mbox{if $a_{i_{j}}=a_{i_{j+1}}$.}\end{array}\right.

\alpha_{j}=\left\{\begin{array}[]{ll}\psi(x_{1})a_{i_{j+1}}\psi(x_{1})a_{i_{j}}&\mbox{if $a_{i_{j}}\neq a_{i_{j+1}}$;}\\ \psi(x_{1})a_{i_{j}}&\mbox{if $a_{i_{j}}=a_{i_{j+1}}$.}\end{array}\right.

\alpha_{j}=\left\{\begin{array}[]{ll}a_{i_{j+1}}\psi(x_{n})a_{i_{j}}\psi(x_{n})&\mbox{if $a_{i_{j}}\neq a_{i_{j+1}}$;}\\ a_{i_{j}}\psi(x_{n})&\mbox{if $a_{i_{j}}=a_{i_{j+1}}$.}\end{array}\right.

\alpha_{j}=\left\{\begin{array}[]{ll}a_{i_{j+1}}\psi(x_{n})a_{i_{j}}\psi(x_{n})&\mbox{if $a_{i_{j}}\neq a_{i_{j+1}}$;}\\ a_{i_{j}}\psi(x_{n})&\mbox{if $a_{i_{j}}=a_{i_{j+1}}$.}\end{array}\right.

\alpha_{j}=\left\{\begin{array}[]{ll}a_{j_{1}}a_{i_{j+1}}a_{j_{2}}\psi(x_{j})a_{j_{2}}a_{i_{j}}a_{j_{1}}&\mbox{if $a_{j_{3}}=a_{i_{j}}$;}\\ a_{j_{1}}a_{i_{j+1}}a_{j_{2}}\psi(x_{j})a_{j_{2}}a_{j_{3}}a_{i_{j}}a_{j_{1}}&\mbox{if $a_{j_{3}}\neq a_{i_{j}}$.}\end{array}\right.

\alpha_{j}=\left\{\begin{array}[]{ll}a_{j_{1}}a_{i_{j+1}}a_{j_{2}}\psi(x_{j})a_{j_{2}}a_{i_{j}}a_{j_{1}}&\mbox{if $a_{j_{3}}=a_{i_{j}}$;}\\ a_{j_{1}}a_{i_{j+1}}a_{j_{2}}\psi(x_{j})a_{j_{2}}a_{j_{3}}a_{i_{j}}a_{j_{1}}&\mbox{if $a_{j_{3}}\neq a_{i_{j}}$.}\end{array}\right.

\alpha_{j}=\left\{\begin{array}[]{ll}a_{j_{2}}a_{i_{j+1}}\psi(x_{j+2})\psi(x_{j})a_{i_{j}}a_{j_{1}}a_{j_{2}}&\mbox{if $2\leq j\leq n-3$;}\\ \psi(x_{1})a_{j_{2}}a_{i_{j+1}}\psi(x_{j+2})\psi(x_{1})a_{i_{j}}a_{j_{1}}a_{j_{2}}&\mbox{if $j-1<1$;}\\ a_{j_{2}}a_{i_{j+1}}\psi(x_{n})\psi(x_{j})a_{i_{j}}a_{j_{1}}a_{j_{2}}\psi(x_{n})&\mbox{if $j+2>n-1$.}\end{array}\right.

\alpha_{j}=\left\{\begin{array}[]{ll}a_{j_{2}}a_{i_{j+1}}\psi(x_{j+2})\psi(x_{j})a_{i_{j}}a_{j_{1}}a_{j_{2}}&\mbox{if $2\leq j\leq n-3$;}\\ \psi(x_{1})a_{j_{2}}a_{i_{j+1}}\psi(x_{j+2})\psi(x_{1})a_{i_{j}}a_{j_{1}}a_{j_{2}}&\mbox{if $j-1<1$;}\\ a_{j_{2}}a_{i_{j+1}}\psi(x_{n})\psi(x_{j})a_{i_{j}}a_{j_{1}}a_{j_{2}}\psi(x_{n})&\mbox{if $j+2>n-1$.}\end{array}\right.

w_{1} \mathchar 58 = φ (x_{1}) a_{i_{1}} q_{1} φ (x_{2}) a_{i_{2}} q_{2} \dots φ (x_{i_{j}}) a_{i_{j}} q_{j} φ (x_{i_{j + 1}}) \dots φ (x_{n - 1}) a_{i_{n - 1}} q_{n - 1} φ (x_{n}) .

w_{1} \mathchar 58 = φ (x_{1}) a_{i_{1}} q_{1} φ (x_{2}) a_{i_{2}} q_{2} \dots φ (x_{i_{j}}) a_{i_{j}} q_{j} φ (x_{i_{j + 1}}) \dots φ (x_{n - 1}) a_{i_{n - 1}} q_{n - 1} φ (x_{n}) .

w_{2}\mathrel{\mathop{\mathchar 58\relax}}=\hbox to0.0pt{$\underbrace{\phantom{\psi(x_{1})a_{i_{1}}\psi(x_{2})}}_{R_{1}}$\hss}\psi(x_{1})a_{i_{1}}\overbrace{\psi(x_{2})a_{i_{2}}\psi(x_{3})}^{R_{2}}\ldots\overbrace{\psi(x_{j})a_{i_{j}}\psi(x_{j+1})}^{R_{j}}\ldots\hbox to0.0pt{$\overbrace{\phantom{\psi(x_{n-2})a_{i_{n-2}}\psi(x_{n-1})}}^{R_{n-2}}$\hss}\psi(x_{n-2})a_{i_{n-2}}\underbrace{\psi(x_{n-1})a_{i_{n-1}}\psi(x_{n})}_{R_{n-1}}.

w_{2}\mathrel{\mathop{\mathchar 58\relax}}=\hbox to0.0pt{$\underbrace{\phantom{\psi(x_{1})a_{i_{1}}\psi(x_{2})}}_{R_{1}}$\hss}\psi(x_{1})a_{i_{1}}\overbrace{\psi(x_{2})a_{i_{2}}\psi(x_{3})}^{R_{2}}\ldots\overbrace{\psi(x_{j})a_{i_{j}}\psi(x_{j+1})}^{R_{j}}\ldots\hbox to0.0pt{$\overbrace{\phantom{\psi(x_{n-2})a_{i_{n-2}}\psi(x_{n-1})}}^{R_{n-2}}$\hss}\psi(x_{n-2})a_{i_{n-2}}\underbrace{\psi(x_{n-1})a_{i_{n-1}}\psi(x_{n})}_{R_{n-1}}.

w_{1}\mathrel{\mathop{\mathchar 58\relax}}=\hbox to0.0pt{$\underbrace{\phantom{\varphi(x_{1})a_{i_{1}}\varphi(x_{2})}}_{P_{1}}$\hss}\varphi(x_{1})a_{i_{1}}\overbrace{\varphi(x_{2})a_{i_{2}}\varphi(x_{3})}^{P_{2}}\ldots\overbrace{\varphi(x_{j})a_{i_{j}}\varphi(x_{j+1})}^{P_{j}}\ldots\hbox to0.0pt{$\overbrace{\phantom{\varphi(x_{n-2})a_{i_{n-2}}\varphi(x_{n-1})}}^{P_{n-2}}$\hss}\varphi(x_{n-2})a_{i_{n-2}}\underbrace{\varphi(x_{n-1})a_{i_{n-1}}\varphi(x_{n})}_{P_{n-1}}.

w_{1}\mathrel{\mathop{\mathchar 58\relax}}=\hbox to0.0pt{$\underbrace{\phantom{\varphi(x_{1})a_{i_{1}}\varphi(x_{2})}}_{P_{1}}$\hss}\varphi(x_{1})a_{i_{1}}\overbrace{\varphi(x_{2})a_{i_{2}}\varphi(x_{3})}^{P_{2}}\ldots\overbrace{\varphi(x_{j})a_{i_{j}}\varphi(x_{j+1})}^{P_{j}}\ldots\hbox to0.0pt{$\overbrace{\phantom{\varphi(x_{n-2})a_{i_{n-2}}\varphi(x_{n-1})}}^{P_{n-2}}$\hss}\varphi(x_{n-2})a_{i_{n-2}}\underbrace{\varphi(x_{n-1})a_{i_{n-1}}\varphi(x_{n})}_{P_{n-1}}.

w_{2} \mathchar 58 = ψ (x_{1}) a_{i_{1}} r_{1} ψ (x_{2}) a_{i_{2}} r_{2} ψ (x_{3}) \dots ψ (x_{j}) a_{i_{j}} r_{j} ψ (x_{j + 1}) \dots ψ (x_{n - 2}) a_{i_{n - 2}} r_{n - 2} ψ (x_{n - 1}) a_{i_{n - 1}} r_{n - 1} ψ (x_{n}) .

w_{2} \mathchar 58 = ψ (x_{1}) a_{i_{1}} r_{1} ψ (x_{2}) a_{i_{2}} r_{2} ψ (x_{3}) \dots ψ (x_{j}) a_{i_{j}} r_{j} ψ (x_{j + 1}) \dots ψ (x_{n - 2}) a_{i_{n - 2}} r_{n - 2} ψ (x_{n - 1}) a_{i_{n - 1}} r_{n - 1} ψ (x_{n}) .

w_{3} \mathchar 58 = α_{1} Q_{1} \dots α_{j} Q_{j} \dots α_{n - 2} Q_{n - 2} .

w_{3} \mathchar 58 = α_{1} Q_{1} \dots α_{j} Q_{j} \dots α_{n - 2} Q_{n - 2} .

\left|\{x\mathrel{\mathop{\mathchar 58\relax}}\mbox{$x$ is a free variable of $\pi^{\prime}$}\}\right|\geq|\mbox{Var}(\pi)|=n.

\left|\{x\mathrel{\mathop{\mathchar 58\relax}}\mbox{$x$ is a free variable of $\pi^{\prime}$}\}\right|\geq|\mbox{Var}(\pi)|=n.

\left|\{x\mathrel{\mathop{\mathchar 58\relax}}\mbox{$x$ is a free variable of $\pi^{\prime}$}\}\right|\leq\ell k.

\left|\{x\mathrel{\mathop{\mathchar 58\relax}}\mbox{$x$ is a free variable of $\pi^{\prime}$}\}\right|\leq\ell k.

w \mathchar 58 = J_{1} γ_{1} H_{1} φ (x_{i_{1}}) J_{2} γ_{2} H_{2} φ (x_{i_{2}}) \dots J_{j} γ_{j} H_{j} φ (x_{i_{j}}) J_{j + 1} γ_{j + 1} \dots H_{m ℓ} φ (x_{i_{m ℓ}}) J_{m ℓ + 1} γ_{m ℓ + 1},

w \mathchar 58 = J_{1} γ_{1} H_{1} φ (x_{i_{1}}) J_{2} γ_{2} H_{2} φ (x_{i_{2}}) \dots J_{j} γ_{j} H_{j} φ (x_{i_{j}}) J_{j + 1} γ_{j + 1} \dots H_{m ℓ} φ (x_{i_{m ℓ}}) J_{m ℓ + 1} γ_{m ℓ + 1},

γ_{j} φ (x_{i_{j}}) γ_{j + 1} \dots γ_{j + c + m + 3} φ (x_{i_{j + c + m + 3}}) .

γ_{j} φ (x_{i_{j}}) γ_{j + 1} \dots γ_{j + c + m + 3} φ (x_{i_{j + c + m + 3}}) .

I_{θ, π} (p_{k}) \cap (J_{j} \cup H_{j} \cup \dots \cup J_{j + c + m + 3} \cup H_{j + c + m + 3}) \neq = \emptyset

I_{θ, π} (p_{k}) \cap (J_{j} \cup H_{j} \cup \dots \cup J_{j + c + m + 3} \cup H_{j + c + m + 3}) \neq = \emptyset

J_{j + 1} \cup H_{j + 1} \cup \dots \cup H_{j + c + m + 2} \cup J_{j + c + m + 3} \subseteq I_{θ, π} (q)

J_{j + 1} \cup H_{j + 1} \cup \dots \cup H_{j + c + m + 2} \cup J_{j + c + m + 3} \subseteq I_{θ, π} (q)

w^{'} \mathchar 58 = γ_{j + 1} φ (x_{i_{j + 1}}) \dots φ (x_{i_{j + c + m + 2}}) γ_{j + c + m + 3}

w^{'} \mathchar 58 = γ_{j + 1} φ (x_{i_{j + 1}}) \dots φ (x_{i_{j + c + m + 2}}) γ_{j + c + m + 3}

∣ \mbox V a r (π) ∣ \geq \frac{1}{m} \cdot \frac{m ℓ}{c + m + 4} = \frac{ℓ}{c + m + 4},

∣ \mbox V a r (π) ∣ \geq \frac{1}{m} \cdot \frac{m ℓ}{c + m + 4} = \frac{ℓ}{c + m + 4},

∣ S ∣ = ∣ S^{'} ∣ + 1 = ℓ + 1 \leq 1 + (c + m + 4) \cdot ∣ \mbox V a r (π) ∣,

∣ S ∣ = ∣ S^{'} ∣ + 1 = ℓ + 1 \leq 1 + (c + m + 4) \cdot ∣ \mbox V a r (π) ∣,

w \mathchar 58 = I_{1} a_{c (x_{l_{1}})} a_{ξ_{l_{1}}}^{p_{l_{1}}} a_{c (x_{l_{1}})} \dots I_{i} a_{c (x_{l_{i}})} a_{ξ_{l_{i}}}^{p_{l_{i}}} a_{c (x_{l_{i}})} \dots I_{n^{'}} a_{c (x_{l_{n^{'}}})} a_{ξ_{l_{n^{'}}}}^{p_{l_{n^{'}}}} a_{c (x_{l_{n^{'}}})} .

w \mathchar 58 = I_{1} a_{c (x_{l_{1}})} a_{ξ_{l_{1}}}^{p_{l_{1}}} a_{c (x_{l_{1}})} \dots I_{i} a_{c (x_{l_{i}})} a_{ξ_{l_{i}}}^{p_{l_{i}}} a_{c (x_{l_{i}})} \dots I_{n^{'}} a_{c (x_{l_{n^{'}}})} a_{ξ_{l_{n^{'}}}}^{p_{l_{n^{'}}}} a_{c (x_{l_{n^{'}}})} .

w_{1} \mathchar 58 = (01)^{n_{0}} (001)^{n_{1}} \dots (0^{j} 1)^{n_{i + 1}} \dots (0^{k + 1} 1)^{n_{k}} .

w_{1} \mathchar 58 = (01)^{n_{0}} (001)^{n_{1}} \dots (0^{j} 1)^{n_{i + 1}} \dots (0^{k + 1} 1)^{n_{k}} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Algorithms and Data Compression · semigroups and automata theory

Full text

11institutetext: Department of Mathematics, National University of Singapore

10 Lower Kent Ridge Road, Singapore 119076, Republic of Singapore

11email: [email protected]

The Teaching Complexity of Erasing Pattern Languages With Bounded Variable Frequency

Ziyuan Gao

Abstract

Patterns provide a concise, syntactic way of describing a set of strings, but their expressive power comes at a price: a number of fundamental decision problems concerning (erasing) pattern languages, such as the membership problem and inclusion problem, are known to be NP-complete or even undecidable, while the decidability of the equivalence problem is still open; in learning theory, the class of pattern languages is unlearnable in models such as the distribution-free (PAC) framework (if $\mathcal{P}/poly\neq\mathcal{NP}/poly$ ). Much work on the algorithmic learning of pattern languages has thus focussed on interesting subclasses of patterns for which positive learnability results may be achieved. A natural restriction on a pattern is a bound on its variable frequency – the maximum number $m$ such that some variable occurs exactly $m$ times in the pattern. This paper examines the effect of limiting the variable frequency of all patterns belonging to a class $\Pi$ on the worst-case minimum number of labelled examples needed to uniquely identify any pattern of $\Pi$ in cooperative teaching-learning models. Two such models, the teaching dimension model as well as the preference-based teaching model, will be considered.

1 Introduction

In the context of this paper, a pattern is a string made up of symbols from two disjoint sets, a countable set $X$ of variables and an alphabet $\Sigma$ of constants. The non-erasing pattern language generated by a pattern $\pi$ is the set of all words obtained by substituting nonempty words over $\Sigma$ for all the variables in $\pi$ , under the condition that for any variable, all of its occurrences in $\pi$ must be replaced with the same word; the erasing pattern language generated by $\pi$ is defined analogously, the only difference being that the variables in $\pi$ may be replaced with the empty string. Unless stated otherwise, all pattern languages in the present paper refer to erasing pattern languages. In computational learning theory, the non-erasing pattern languages were introduced by Angluin [3] as a motivating example for her work on the identification of uniformly decidable families of languages in the limit. Shinohara [35] later introduced the class of erasing pattern languages, proving that the class of all such languages generated by regular patterns (patterns in which every variable occurs at most once) is polynomial-time learnable in the limit. Patterns and allied notions - such as that of an extended regular expression [1, 9, 33, 14], which has more expressive power than a pattern – have also been studied in other fields, including word combinatorics and pattern matching. For example, the membership problem for pattern languages is closely related to the problem of matching ‘patterns’ with variables (based on various definitions of ‘pattern’) in the pattern matching community [6, 2, 12, 10, 11].

The present paper considers the problem of uniquely identifying pattern languages from labelled examples – where a labelled example for a pattern language $L$ is a pair $(w,*)$ such that $*$ is “ $+$ ” if $w$ belongs to $L$ and “ $-$ ” otherwise – based on formal teaching-learning models. We shall study two such models in the computational learning theory literature: the well-known teaching dimension (TD) model [19, 34] and the preference-based teaching (PBT) model [17] (c.f. Section 3). Given a model $\mathcal{T}$ and any class $\Pi$ of patterns to be learnt, the maximum size of a sample (possibly $\infty$ ) needed for a learner to successfully identify any pattern in $\Pi$ based on the teaching-learning algorithm of $\mathcal{T}$ is known as the teaching complexity of $\Pi$ (according to $\mathcal{T}$ ). The broad question we try to partly address is: what properties of the patterns in a given class $\Pi$ of patterns influence the teaching complexity of $\Pi$ according to the TD and PBT models? More specifically, let $\Pi_{m}$ be a class of patterns $\pi$ such that the maximum number of times any single variable occurs in $\pi$ (known here as the variable frequency of $\pi$ ) is at most $m$ ; how does the teaching complexity of $\Pi_{m}$ vary with $m$ ? The variable frequency of a pattern is quite a natural parameter that has been investigated in other problems concerning pattern languages. For example, Matsumoto and Shinohara [27] established an upper bound on the query complexity of learning (non-erasing) pattern languages in terms of the variable frequency of the pattern and other parameters; Fernau and Schmid [13] proved that the membership problem for patterns remains NP-complete even when the variable frequency is restricted to $2$ (along with other parameter restrictions).

In this paper, one motivation for concentrating on the variable frequency of a pattern rather than, say, the number of distinct variables occurring in the pattern, comes from examining the teaching complexity of some basic patterns. Take the constant pattern [math], where [math] is a letter in the alphabet $\Sigma$ of constants. The language generated by this pattern cannot be finitely distinguished (i.e., distinguished using a finite set of labelled examples) from every other pattern language, even only those generated by a pattern with at most one variable. Indeed, any finite set $\{(0,+),(w_{1},-),\ldots,(w_{k},-)\}$ of labelled examples for the pattern [math] is also consistent with the pattern $0x^{m}$ where $m=\max_{1\leq i\leq k}|w_{i}|$ . The latter observation depends crucially on the fact that a variable may occur any number of times in a pattern, and less so on the number of distinct variables occurring in a pattern. A similar remark applies to the pattern languages generated by patterns with a constant part of length at least $2$ [7, Theorem 3]. On the other hand, if one were to teach the singleton language $\{0\}$ w.r.t. all languages generated by patterns with variable frequency at most $k$ for some fixed $k$ , then a finite distinguishing set for $\{0\}$ could consist of $(0,+)$ plus all negative examples $(0^{n},-)$ with $2\leq n\leq k+1$ . This seems to suggest that the maximum variable frequency of the patterns in a class of patterns may play a crucial role in determining whether or not the languages generated by members of this class are finitely distinguishable.

The first section of this work studies the teaching complexity of simple block-regular patterns, which are equivalent to patterns of the shape $x_{1}a_{1}x_{2}a_{2}\ldots a_{n-1}$ $x_{n}$ , where $x_{1},\ldots,x_{n}$ are distinct variables and $a_{1},\ldots,a_{n-1}$ are constants. They make up one of the simplest, non-trivial classes of patterns that have a restriction on the variable frequency. Bayeh et al. [7] showed that over alphabets of size at least $4$ , the languages generated by such patterns are precisely those that are finitely distinguishable; we refine this result by determining, over any alphabet, the TD and PBT dimensions of the class of simple block-regular patterns. Further, we calculate the TD of these patterns w.r.t. the class of regular patterns and provide an asymptotic lower bound for the TD of any given simple block-regular pattern w.r.t. the whole class of patterns. In the subsequent section, we proceed to the more general problem of determining, for various natural classes $\Pi$ of patterns that have a uniformly bounded variable frequency, those members of $\Pi$ that are finitely distinguishable. It will be proven that all $m$ -quasi-regular patterns (i.e. every variable of the pattern occurs exactly $m$ times) and $m$ -regular (i.e. every variable occurs at most $m$ times) non-cross patterns are finitely distinguishable w.r.t. the class of $m$ -quasi-regular and $m$ -regular non-cross patterns respectively; moreover, the TD of the class of $m$ -regular non-cross patterns is even finite and in fact sublinear in $m$ . Next, we present partial results on the problem of determining the subclass of $m$ -regular patterns that have a finite TD. Over any infinite alphabet, every $m$ -regular pattern is finitely distinguishable – contrasting quite sharply with the previously mentioned theorem that over alphabets with at least $4$ letters, the only patterns with a finite TD are the simple block-regular ones. Over binary alphabets, on the other hand, there are patterns that are not finitely distinguishable even when the variable frequency is restricted to $4$ .

Due to space constraints, most proofs have been deferred to the appendix.

2 Preliminaries

${\mathbb{N}}_{0}$ denotes the set of natural numbers $\{0,1,2,\ldots\}$ and ${\mathbb{N}}={\mathbb{N}}_{0}\setminus\{0\}$ . Let $X=\{x_{1},x_{2},x_{3},\ldots\}$ be an infinite set of variable symbols. An alphabet is a finite or countably infinite set of symbols, disjoint from $X$ . Fix an alphabet $\Sigma$ . A pattern is a nonempty finite string over $X\cup\Sigma$ . The class of patterns over any alphabet $\Sigma$ with $z=|\Sigma|$ is denoted by $\Pi^{z}$ ; this notation reflects the fact that all the properties of patterns and classes of patterns considered in the present work depend only on the size of the alphabet and not on the actual letters of the alphabet. The erasing pattern language $L(\pi)$ generated by a pattern $\pi$ over $\Sigma$ consists of all strings generated from $\pi$ when replacing variables in $\pi$ with any string over $\Sigma$ , where all occurrences of a single variable must be replaced by the same string [35]. Patterns $\pi$ and $\tau$ over $\Sigma$ are said to be equivalent iff $L(\pi)=L(\tau)$ ; they are similar iff $\pi=\alpha_{1}u_{1}\alpha_{2}u_{2}\ldots u_{n}\alpha_{n}$ and $\tau=\beta_{1}u_{1}\beta_{2}u_{2}\ldots u_{n}\beta_{n}$ for some $u_{1},u_{2},\ldots,u_{n}\in\Sigma^{+}$ and $\alpha_{1},\ldots,\alpha_{n},\beta_{1},\ldots,\beta_{n}\in X^{*}$ . Unless specified otherwise, we identify any pattern $\pi$ belonging to a class $\Pi$ of patterns with every other $\pi^{\prime}\in\Pi$ such that $L(\pi)=L(\pi^{\prime})$ . $\mbox{Var}(\pi)$ (resp. $\mbox{Const}(\pi)$ ) denotes the set of all distinct variables (resp. constant symbols) occurring in $\pi$ .

For any symbol $a$ and $n\in{\mathbb{N}}_{0}$ , $a^{n}$ denotes the string equal to $n$ concatenated copies of $a$ . For any alphabets $A$ and $B$ , a morphism is a function $h\mathrel{\mathop{\mathchar 58\relax}}A^{*}\rightarrow B^{*}$ with $h(uv)=h(u)h(v)$ for all $u,v\in A^{*}$ . A substitution is a morphism $h\mathrel{\mathop{\mathchar 58\relax}}(\Sigma\cup X)^{*}\rightarrow\Sigma^{*}$ with $h(a)=a$ for all $a\in\Sigma$ . By abuse of notation, we will often use the same symbol $h$ to represent the morphism $(X\cup\Sigma)^{*}\mapsto\Sigma^{*}$ that coincides with the substitution $h$ on individual variables and with the identity function on letters from $\Sigma$ . ${\mathcal{I}}_{h,\pi}$ denotes the mapping of closed intervals of positions of $\pi$ to closed intervals of positions of $h(\pi)$ induced by $h$ ; $\pi(\varepsilon)$ denotes the word obtained from $\pi$ by substituting $\varepsilon$ for every variable in $\pi$ . Let $\sqsubseteq$ denote the subsequence relation on $\Sigma^{*}$ : $u\sqsubseteq v$ holds iff there are numbers $i_{1}<i_{2}<\ldots<i_{|u|}$ such that $v_{i_{j}}=u_{j}$ for all $j\in\{1,\ldots,|u|\}$ . Given any $u,v\in\Sigma^{*}$ , the shuffle product of $u$ and $v$ , denoted by $u\shuffle v$ , is the set $\{u_{1}v_{1}u_{2}v_{2}\ldots u_{k}v_{k}\mathrel{\mathop{\mathchar 58\relax}}u_{i},v_{i}\in\Sigma^{*}\wedge u_{1}u_{2}\ldots u_{k}=u\wedge v_{1}v_{2}\ldots v_{k}=v\}$ . Given any $A,B\subseteq\Sigma^{*}$ , the shuffle product of $A$ and $B$ , denoted by $A\shuffle B$ , is the set $\bigcup_{u\in A\wedge v\in B}u\shuffle v$ . If $A=\{u\}$ , we will often write $A\shuffle B$ as $u\shuffle B$ .

3 Teaching Dimension and Preference-based Teaching Dimension

Machine teaching focusses on the problem of designing, for any given learning algorithm, an optimal training set for every concept belonging to a class of concepts to be learnt [36]. Such a training set is sometimes known as a teaching set. In this work, an “optimal” teaching set for a pattern $\pi$ is one that has the minimum number of examples labelled consistently with $\pi$ needed for the algorithm to successfully identify $\pi$ (up to equivalence). We study the design of optimal teaching sets for various classes of pattern languages w.r.t. (i) the classical teaching dimension model [19, 34], where it is only assumed that the learner’s hypotheses are always consistent with the given teaching set; (ii) the preference-based teaching model [17], where the learner has, for any given concept class, a particular “preference relation” on the class, and the learner’s hypotheses are always not only consistent with the given teaching set, but also not less preferred to any other concept in the class w.r.t. the preference relation.

Fix an alphabet $\Sigma$ . Let $\Pi$ be any class of patterns, and suppose $\pi\in\Pi$ . A teaching set for $\pi$ w.r.t. $\Pi$ is a set $T\subseteq\Sigma\times\{+,-\}$ that is consistent with $\pi$ but with no other pattern in $\Pi$ (up to equivalence), that is, $w\in L(\pi)$ for all $(w,+)\in T$ and $w\notin L(\pi)$ for all $(w,-)\in T$ . The teaching dimension of $\pi$ w.r.t. $\Pi$ , denoted by $\mbox{TD}(\pi,\Pi)$ is defined as $\mbox{TD}(\pi,\Pi)=\inf\{|T|\mathrel{\mathop{\mathchar 58\relax}}T\mbox{ is a teaching set for }\pi\mbox{ w.r.t.\ }\Pi\}.$ Furthermore, if $\Pi^{\prime}\subseteq\Pi$ , then the teaching dimension of $\Pi^{\prime}$ w.r.t. $\Pi$ , denoted by $\mbox{TD}(\Pi^{\prime},\Pi)$ , is defined as $\mbox{TD}(\Pi^{\prime},\Pi)=\sup\{\mbox{TD}(\pi,\Pi)\mathrel{\mathop{\mathchar 58\relax}}\pi\in\Pi^{\prime}\}.$ The teaching dimension of $\Pi$ , denoted by $\mbox{TD}(\Pi)$ , is defined as $\mbox{TD}(\Pi,\Pi)$ .

In real-world learning scenarios, even the smallest possible teaching set for a given concept relative to some concept class may be impractically large. Learning algorithms often make predictions based on a set of assumptions known as the inductive bias, which may allow the algorithm to infer a target concept from a small set of data even when there is more than one concept in the class that is consistent with the data. Certain types of bias impose an a priori preference ordering on the learner’s hypothesis space; for example, an algorithm that adheres to the Minimum Description Length (MDL) principle favours hypotheses that have shorter descriptions based on some given description language. The preference-based teaching model, to be defined shortly, considers learning algorithms with an inductive bias that specifies a preference ordering of the learner’s hypotheses.

Let $\prec$ be a strict partial order on $\Pi$ , i.e., $\prec$ is asymmetric and transitive. The partial order that makes every pair $\pi,\pi^{\prime}\in\Pi$ (where $L(\pi)\neq L(\pi^{\prime})$ ) incomparable is denoted by $\prec_{\emptyset}$ . For every $\pi\in\Pi$ , let $\Pi_{\prec\pi}=\{\pi^{\prime}\in\Pi\mathrel{\mathop{\mathchar 58\relax}}\pi^{\prime}\prec\pi\}$ be the set of patterns over which $\pi$ is strictly preferred (as mentioned earlier, equivalent patterns are identified with each other). A teaching set for $\pi$ w.r.t. $(\Pi,\prec)$ is defined as a teaching set for $\pi$ w.r.t. $\Pi\setminus\Pi_{\prec\pi}$ . Furthermore define $\mbox{PBTD}(\pi,\Pi,\prec)=\inf\{|T|\mathrel{\mathop{\mathchar 58\relax}}T\mbox{ is a teaching set for$ \pi $w.r.t.~{}$ (\Pi,\prec $})\}\in{\mathbb{N}}_{0}\cup\{\infty\}$ . The number $\mbox{PBTD}(\Pi,\prec)=\sup_{\pi\in\Pi}\mbox{PBTD}(\pi,\Pi,\prec)\in{\mathbb{N}}_{0}\cup\{\infty\}$ is called the teaching dimension of $(\Pi,\prec)$ . The preference-based teaching dimension of $\Pi$ is given by $\mbox{PBTD}(\Pi)=\inf\{\mbox{PBTD}(\Pi,\prec)\mathrel{\mathop{\mathchar 58\relax}}\mbox{$ \prec $is a strict partial order on$ \Pi $}\}.$ For all pattern classes $\Pi$ and $\Pi^{\prime}$ with $\Pi^{\prime}\subseteq\Pi$ , $K(\Pi^{\prime})\leq K(\Pi)$ for $K\in\{\mbox{TD},\mbox{PBTD}\}$ (i.e. the TD and PBTD are monotonic) and $\mbox{PBTD}(\Pi)\leq\mbox{TD}(\Pi)$ [17].

4 Simple Block-Regular Patterns

Fix an alphabet $\Sigma$ of size $z\leq\infty$ . A pattern $\pi\in\Pi^{z}$ is said to be simple block-regular if it is of the shape $X_{1}a_{1}X_{2}a_{2}\ldots a_{n-1}X_{n}$ , where $X_{1},\ldots,X_{n}\in X^{+}$ , $a_{1},\ldots,a_{n-1}\in\Sigma$ , and for all $i\in\{1,\ldots,n\}$ , $X_{i}$ contains a variable that does not occur in any other variable block $X_{j}$ with $j\neq i$ . Every simple block-regular pattern is equivalent to a pattern $\pi^{\prime}$ of the shape $y_{1}a_{1}y_{2}a_{2}\ldots$ $a_{k}y_{k+1}$ , where $k\geq 0$ , $a_{1},a_{2},\ldots,a_{k}\in\Sigma$ and $y_{1},y_{2},\ldots,y_{k+1}$ are $k+1$ distinct variables [20, Theorem 6(b)]. $\mbox{SR$ \Pi $}^{z}$ denotes the class of all simple block-regular patterns in $\Pi^{z}$ . $\mbox{SR$ \Pi $}^{z}$ is a subclass of the family of regular patterns (denoted by $\mbox{R}\Pi^{z}$ ), which are patterns in which every variable occurs at most once.

As mentioned in the introduction, the simple block-regular patterns constitute precisely the subclass of finitely distinguishable patterns over any alphabet of size at least $4$ [7, Theorem 3]. The language generated by a simple block-regular pattern is known as a principal shuffle ideal in word combinatorics [25, §6.1], and the family of all such languages is an important object of study in the PAC learning model [5].

The goal of this section is to determine the teaching complexity of the class of simple block-regular patterns over any alphabet $\Sigma$ w.r.t. three classes: $\mbox{SR$ \Pi $}^{|\Sigma|}$ itself, $\mbox{R}\Pi^{|\Sigma|}$ and $\Pi^{|\Sigma|}$ . It will be shown that $\mbox{TD}(\mbox{SR$ \Pi $}^{|\Sigma|})<\mbox{TD}(\mbox{SR$ \Pi $}^{|\Sigma|},\mbox{R}\Pi^{|\Sigma|})$ $<\mbox{TD}(\mbox{SR$ \Pi $}^{|\Sigma|},\Pi^{|\Sigma|})$ . To this end, we introduce a uniform construction of a certain negative example for any given pattern $\pi$ ; as will be seen shortly, this example is powerful enough to distinguish $\pi$ from every simple block-regular pattern whose constant part is a proper subsequence (not necessarily contiguous) of the constant part of $\pi$ .

Notation 1

For any word $w=\delta_{1}^{m_{1}}\delta_{2}^{m_{2}}\ldots\delta_{k}^{m_{k}}$ , where $\delta_{1},\ldots,\delta_{k}\in\Sigma$ and $\delta_{i}\neq\delta_{i+1}$ whenever $1\leq i<k$ , $m_{1},\ldots,m_{k}\geq 1$ and $k\geq 1$ , define

[TABLE]

(In particular, if $m\geq 1$ , then $\widehat{\delta_{1}^{m}}=\delta_{1}^{m-1}$ .)

Lemma 2

Fix any $z\in{\mathbb{N}}\cup\{\infty\}$ and any $\pi,\tau\in\mbox{SR$ \Pi $}^{z}$ with $\pi(\varepsilon)\neq\varepsilon$ . Then $\widehat{\pi(\varepsilon)}\notin L(\pi)$ . Furthermore, if $\tau(\varepsilon)\sqsubset\pi(\varepsilon)$ , then $\widehat{\pi(\varepsilon)}\in L(\tau)$ .

Proof. Suppose $\pi(\varepsilon)=\delta_{1}^{m_{1}}\delta_{2}^{m_{2}}\ldots\delta_{k}^{m_{k}}$ , where $\delta_{1},\ldots,\delta_{k}\in\Sigma$ and $\delta_{i}\neq\delta_{i+1}$ whenever $1\leq i<k$ , $m_{1},\ldots,$ $m_{k}\geq 1$ and $k\geq 1$ . That $\widehat{\pi(\varepsilon)}\notin L(\pi)$ may be argued as follows: if $k=1$ , then $\widehat{\pi(\varepsilon)}=\delta_{1}^{m_{1}-1}\sqsubset\pi(\varepsilon)$ is immediate; if $k\geq 2$ , then one shows by induction that for $i=1,\ldots,k-1$ , $\delta_{1}^{m_{1}}\delta_{2}^{m_{2}}\ldots\delta_{i}^{m_{i}}\delta_{i+1}\not\sqsubseteq\underbrace{\delta_{1}^{m_{1}-1}\delta_{2}^{m_{2}}\delta_{1}}\underbrace{\delta_{2}^{m_{2}-1}\delta_{3}^{m_{3}}\delta_{2}}\ldots$ $\underbrace{\delta_{i}^{m_{i}-1}\delta_{i+1}^{m_{i+1}}\delta_{i}}$ . For the second part of the lemma, suppose $\tau(\varepsilon)=\delta_{1}^{n_{1}}\delta_{2}^{n_{2}}\ldots\delta_{k}^{n_{k}}$ , where $0\leq n_{i}\leq m_{i}$ for all $i\in\{1,\ldots,k\}$ and $n_{i_{0}}\leq m_{i_{0}}-1$ for some least number $i_{0}$ . Taking $w=\pi(\varepsilon)$ in Equation (1), observe that $\delta_{i}^{n_{i}}\sqsubseteq\delta_{i}^{m_{i}-1}\delta_{i+1}^{m_{i+1}}\delta_{i}$ for all $i<i_{0}$ , $\delta_{i_{0}}^{n_{i_{0}}}\sqsubseteq\delta_{i_{0}}^{m_{i_{0}}-1}$ , and $\delta_{j}^{n_{j}}\sqsubseteq\delta_{j}^{m_{j}}\delta_{j-1}\delta_{j}^{m_{j}-1}$ for all $j>i_{0}$ . Thus, since $\tau$ is simple block-regular, one has that $\widehat{\pi(\varepsilon)}\in L(\tau)$ .

Lemma 2 now provides a tool for establishing the TD of $\mbox{SR$ \Pi $}^{z}$ .

Theorem 3

For any $z\in{\mathbb{N}}\cup\{\infty\}$ , $\mbox{TD}(\mbox{SR$ \Pi $}^{z})=2$ and $\mbox{PBTD}(\mbox{SR$ \Pi $}^{z})=1$ .

Proof. Fix any $0\in\Sigma$ . The pattern $\pi\mathrel{\mathop{\mathchar 58\relax}}=x_{1}0x_{2}$ needs to be taught with at least one negative example in order to distinguish it from $x_{1}$ . Suppose a teaching set for $\pi$ contains $(w_{1}w_{2}\ldots w_{k},-)$ , where $w_{1},\ldots,w_{k}\in\Sigma$ . For any $m\geq 3$ , $w_{1}w_{2}\ldots w_{k}\notin L(\pi^{\prime})$ , where $\pi^{\prime}\mathrel{\mathop{\mathchar 58\relax}}=x_{1}w_{1}x_{2}w_{2}x_{3}\ldots x_{k}w_{k}x_{k+1}0x_{k+2}0$ $\ldots 0x_{k+m}$ . Since $\pi^{\prime}$ is simple block-regular and $L(\pi^{\prime})\neq L(\pi)$ , at least one additional example is required to distinguish $\pi$ from $\pi^{\prime}$ . Hence $\mbox{TD}(\mbox{SR$ \Pi $}^{z})\geq 2$ .

Let $\pi$ be any simple block-regular pattern. Since $x_{1}$ can be taught with the single example $(\varepsilon,+)$ , we will suppose that $\pi(\varepsilon)\neq\varepsilon$ . A teaching set for $\pi$ consists of the two examples $(\pi(\varepsilon),+)$ and $(\widehat{\pi(\varepsilon)},-)$ . By Lemma 2, $(\widehat{\pi(\varepsilon)},-)$ is consistent with $\pi$ and $(\widehat{\pi(\varepsilon)},-)$ distinguishes $\pi$ from all patterns $\pi^{\prime}$ such that $\pi^{\prime}(\varepsilon)\sqsubset\pi(\varepsilon)$ , while $(\pi(\varepsilon),+)$ distinguishes $\pi$ from all patterns $\pi^{\prime\prime}$ such that $\pi^{\prime\prime}(\varepsilon)\not\sqsubseteq\pi(\varepsilon)$ .

Let $\prec$ be a preference relation on $\mbox{SR$ \Pi $}^{z}$ such that for any $\pi,\tau\in\mbox{SR$ \Pi $}^{z}$ with $L(\pi)\neq L(\tau)$ , $\pi\prec\tau$ iff $\left|\pi(\varepsilon)\right|<\left|\tau(\varepsilon)\right|$ . Every $\pi\in\mbox{SR$ \Pi $}^{z}$ can be taught w.r.t. $(\mbox{SR$ \Pi $}^{z},\prec)$ using the example $(\pi(\varepsilon),+)$ : for every $\tau\in\mbox{SR$ \Pi $}^{z}$ such that $L(\tau)\neq L(\pi)$ and $\pi(\varepsilon)\in L(\tau)$ , $\tau(\varepsilon)\sqsubset\pi(\varepsilon)$ ; thus $|\tau(\varepsilon)|<|\pi(\varepsilon)|$ and so $\pi\succ\tau$ .

Not surprisingly, the TD of a simple block-regular pattern is in general larger w.r.t. the whole class of regular patterns than w.r.t. the restricted class of simple block-regular patterns. It might be worth noting that a smallest teaching set for a simple block-regular pattern $\pi$ need not necessarily contain $\pi(\varepsilon)$ as a positive example, as the proof of the following result (c.f. Appendices C and E) shows.

Theorem 4

$\mbox{TD}(\mbox{SR$ \Pi $}^{z},\mbox{R}\Pi^{z})=3$ .

To prove the lower bound in Theorem 4, it suffices to observe that any teaching set (w.r.t. the whole class of regular patterns) for a non-constant regular pattern not equivalent to $x_{1}$ must contain at least two positive examples and one negative example; for a very similar proof, see [7, Theorem 12.1]. We prove the upper bound. If $z=1$ , then $\mbox{R}\Pi^{z}$ is the union of $\mbox{SR$ \Pi $}^{z}$ and all constant patterns (up to equivalence). By the proof of Theorem 3, any $\pi\in\mbox{SR$ \Pi $}^{z}$ can be distinguished from every non-equivalent $\tau\in\mbox{SR$ \Pi $}^{z}$ with one positive example or one positive and one negative example; to distinguish $\pi$ from any constant pattern, at most one additional positive example is needed. Suppose $z\geq 2$ . The proof will be split into the cases (i) $|\Sigma|=2$ and (ii) $|\Sigma|\geq 3$ .

Lemma 5

If $\pi\in\mbox{SR$ \Pi $}^{2}$ , then $\mbox{TD}(\pi,\mbox{R}\Pi^{2})\leq 3$ .

The basic proof idea of Lemma 5 – using positive examples to exclude certain types of constant segments of the target pattern – can also be generalised to the case $|\Sigma|\geq 3$ , although the details of the construction are more tedious.

Lemma 6

Suppose $z=|\Sigma|\geq 3$ . If $\pi\in\mbox{SR$ \Pi $}^{z}$ , then $\mbox{TD}(\pi,R\Pi^{z})\leq 3$ .

The next result determines upper (for $|\Sigma|\in\{1,\infty\}$ ) and lower (for $|\Sigma|\in{\mathbb{N}}\cup\{\infty\}$ ) bounds for the TD of any given simple block-regular pattern w.r.t. the whole class of patterns. It turns out that these bounds vary with the alphabet size.

Theorem 7

Suppose $z\in{\mathbb{N}}\cup\{\infty\}$ and $\pi=x_{1}c_{1}x_{2}\ldots c_{n-1}x_{n}$ for some $c_{1},\ldots,$ $c_{n-1}\in\Sigma$ and $n\geq 2$ . (i) If $z\in\{1,\infty\}$ , then $\mbox{TD}(\pi,\Pi^{z})\in\{1,3\}$ . (ii) If $2\leq z<\infty$ , then $\mbox{TD}(\pi,\Pi^{z})=\Omega(|\pi|)$ .

We do not know whether the lower bound given in Assertion (ii) of Theorem 7 is also an upper bound (up to numerical constant factors). In the proof of [7, Proposition 4], it was shown that the TD of every simple block-regular pattern $\pi$ is $O(2^{|\pi|})$ .

5 Finite Distinguishability of $m$ -Quasi-Regular, Non-Cross $m$ -Regular and $m$ -Regular Patterns

This section studies the problem of determining the subclass of finitely distinguishable patterns w.r.t. three classes: the $m$ -quasi-regular patterns, the non-cross $m$ -regular patterns, and the $m$ -regular patterns. The first two classes are interesting from an algorithmic learning perspective as they provide natural examples of pattern language families that are learnable in the limit111Roughly speaking, a class of languages is learnable in the limit if there is a learning algorithm such that, given any infinite sequence of all positive examples for any language $L$ in the class, the algorithm outputs a corresponding sequence of guesses for the target language (based on a representation system for the languages in the class) that converges to a fixed representation for $L$ ; this model is due to Gold [16]. [28, 31]. The $m$ -regular patterns are a fairly natural generalisation of the $m$ -quasi-regular patterns; as will be seen later, the class of constant-free $4$ -regular patterns is not identifiable in the limit over binary alphabets, and in particular, not all $m$ -regular patterns are finitely distinguishable over binary alphabets.

Notation 8

Fix any $\ell\geq 0$ and $z,m\geq 1$ . An $\ell$ -variable pattern is one that has at most $\ell$ distinct variables. Let $\Pi^{z}_{\ell,m}$ denote the class of $\ell$ -variable patterns $\pi$ such that every variable occurs at most $m$ times in $\pi$ ; if $\ell=\infty$ , then there is no uniform upper bound on the number of distinct variables occurring in any $\pi\in\Pi^{z}_{\ell,m}$ ; if $m=\infty$ , then there is no uniform upper bound on the number of times any variable can occur. We call every $\pi\in\Pi^{z}_{\infty,m}$ an $m$ -regular pattern. $\Pi^{z}_{\infty,m,cf}$ denotes the class of all constant-free $m$ -regular patterns.

Let $\mbox{QR$ \Pi $}^{z}_{\ell,m}$ denote the class of all $\ell$ -variable patterns $\pi$ such that every variable of $\pi$ occurs exactly $m$ times; again, if $\ell=\infty$ , then there is no uniform upper bound on the number of distinct variables occurring in any $\pi\in\mbox{QR$ \Pi $}^{z}_{\ell,m}$ . Every $\pi\in\mbox{QR$ \Pi $}^{z}_{\infty,m}$ is known as an $m$ -quasi-regular pattern [28]. We denote the class of constant-free $m$ -quasi-regular patterns by $\mbox{QR$ \Pi $}^{z}_{\infty,m,cf}$ .

Mitchell [28] showed that for any $m\geq 1$ , the class of $m$ -quasi-regular pattern languages is learnable in the limit. The next theorem shows that for all $z\geq 1$ , every $m$ -quasi-regular pattern even has a finite teaching set w.r.t. $\mbox{QR$ \Pi $}^{z}_{\infty,m}$ . Thus, at least as far as $m$ -quasi-regular patterns are concerned, version space learning with a helpful teacher is just as powerful as learning in the limit. We begin with a lemma, which states that for any given $m$ -quasi-regular pattern $\pi$ and every $m$ -quasi-regular pattern $\tau$ with $L(\tau)\not\subseteq L(\pi)$ , there is some $S\subseteq\mbox{Var}(\tau)$ of size at most linear in $|\mbox{Var}(\pi)|$ for which $L\left(\tau{\big{|}}_{\Sigma\cup S}\right)\not\subseteq L(\pi)$ ; for any $S^{\prime}\subseteq X\cup\Sigma$ , $\tau{\big{|}}_{S^{\prime}}$ is the subsequence of $\tau$ obtained by deleting symbols not in $S^{\prime}$ .

Lemma 9

Fix $\Sigma$ with $z=|\Sigma|\geq 2$ and $\{0,1\}\subseteq\Sigma$ . Suppose $m\geq 1$ and $\pi,\tau\in\mbox{QR$ \Pi $}^{z}_{\infty,m}$ . If $\tau(\varepsilon)=\pi(\varepsilon)$ and $L(\tau)\not\subseteq L(\pi)$ , then there is some $S\subseteq\mbox{Var}(\tau)$ with $|S|\leq 1+\left(|\pi(\varepsilon)|+m+4\right)\cdot|\mbox{Var}(\pi)|$ such that $L\left(\tau{\big{|}}_{\Sigma\cup S}\right)\not\subseteq L(\pi)$ .

Theorem 10

If $z=1$ , then $\mbox{TD}(\mbox{QR$ \Pi $}^{z}_{\infty,m})=3$ . If $z\geq 2$ , then for every $\pi\in\mbox{QR$ \Pi $}^{z}_{\infty,m}$ , $\mbox{TD}(\pi,\mbox{QR$ \Pi $}^{z}_{\infty,m})=O(2^{|\pi(\varepsilon)|}+D\cdot(|\pi(\varepsilon)|+D\cdot m)^{D\cdot m})$ , where $D\mathrel{\mathop{\mathchar 58\relax}}=\max(\{(1/m)\cdot(2\cdot|\pi|-|\pi(\varepsilon)|),1+(|\pi(\varepsilon)|+m+4)\cdot|\mbox{Var}(\pi)|\})$ .

Next, we show that the PBTD of the class of constant-free $m$ -quasi-regular pattern languages is exactly $1$ for large enough alphabet sizes. We establish this value by observing that if the adjacency graph of a constant-free $m$ -quasi-regular pattern $\pi$ [26, Chapter 3] has a colouring satisfying certain conditions, where each colour corresponds to a letter in the alphabet, then such a colouring can be used to construct a positive example for $\pi$ that distinguishes it from all shorter constant-free $m$ -quasi-regular patterns.

Theorem 11

For any $z\geq 1$ , $\mbox{TD}(\mbox{QR$ \Pi $}^{z}_{\infty,1,cf})=\mbox{PBTD}(\mbox{QR$ \Pi $}^{z}_{\infty,1,cf})=0$ . Suppose $m\geq 2$ . If $z=|\Sigma|\geq 4m^{2}+1$ , then $\mbox{PBTD}(\mbox{QR$ \Pi $}^{z}_{\infty,m,cf})=1$ .

While the PBTD of the class of $m$ -quasi-regular patterns remains open in full generality, we observe that over unary alphabets, the PBTD of this class is exactly $2$ for any $m\geq 1$ .

Proposition 12

For any $m\geq 1$ , $\mbox{PBTD}(\mbox{QR$ \Pi $}^{1}_{\infty,m})=2$ . If $z\geq 2$ , then $\mbox{PBTD}(\mbox{QR$ \Pi $}^{z}_{\infty,m})\geq 2$ .

A non-cross pattern $\pi$ is a constant-free pattern of the shape $x_{0}^{n_{0}}x_{1}^{n_{1}}\ldots x_{k}^{n_{k}}$ , where $n_{0},n_{1},\ldots,n_{k}$ $\in{\mathbb{N}}$ . Let $\mbox{NC}\Pi^{z}_{\infty,m}$ denote the class of all non-cross patterns $\pi$ over any $\Sigma$ with $|\Sigma|=z$ such that every variable of $\pi$ occurs at most $m$ times. $\mbox{NC}\Pi^{z}_{\infty,\infty}$ coincides with $\mbox{NC}\Pi^{z}$ , the class of all non-cross patterns. The next main result shows that for any fixed $m$ , the TD of every pattern in $\mbox{NC}\Pi^{z}_{\infty,m}$ is not only finite, but also has a uniform upper bound depending only on $m$ . Slightly more interestingly, the teaching complexity of $\mbox{NC}\Pi^{z}_{\infty,m}$ in the preference-based teaching model varies with the alphabet size when $m\geq 2$ : over unary alphabets, the PBTD of this class is exactly linear in $m$ , while over alphabets of size at least $2$ , the PBTD is exactly $1$ . In the following lemma, we observe certain properties of an “unambiguous” word that was constructed in [31, Lemma 13].

Lemma 13

(Based on [31, Lemma 13]) Suppose $\{0,1\}\subseteq\Sigma$ . Fix any $m\geq 2$ , and let $\pi=x_{0}^{n_{0}}\ldots x_{k}^{n_{k}}$ , where $n_{0},\ldots,n_{k}\in\{2,\ldots,m\}$ . Suppose there are positive numbers $\ell$ and $i_{1},\ldots,i_{\ell}$ such that

[TABLE]

where, for each $j\in\{1,\ldots,\ell\}$ , $I_{j}$ is the closed interval of positions of $w$ occupied by the subword $(0^{j}1)^{i_{j}}$ as indicated with braces in Equation (2). For each $j\in\{0,\ldots,k\}$ , let $J_{j}$ denote the closed interval of positions of $\pi$ occupied by $x_{j}^{n_{j}}$ . Let $h$ be any substitution such that $h(\pi)=w$ and $h(x_{i})\neq\varepsilon$ for all $i\in\{0,\ldots,k\}$ . Then the following hold.

(i)

For all $j\in\{0,\ldots,k\}$ , $h(x_{j})$ is of the shape $(0^{j^{\prime}}1)^{i^{\prime}}$ for some $j^{\prime}\in\{1,\ldots,\ell\}$ and $i^{\prime}\in\{1,\ldots,i_{j^{\prime}}\}$ . 2. (ii)

For each $j\in\{1,\ldots,\ell\}$ , there are $g_{j}\in\{0,\ldots,k\}$ and $h_{j}\in\{0,\ldots,k-g_{j}\}$ such that $I_{j}=\coprod_{l=0}^{h_{j}}{\mathcal{I}}_{h,\pi}(J_{g_{j}+l})$ .

Theorem 14

For all $z\in{\mathbb{N}}\cup\{\infty\}$ , $\mbox{TD}(\mbox{NC}\Pi^{z}_{\infty,1})=\mbox{PBTD}(\mbox{NC}\Pi^{z}_{\infty,1})=0$ . Suppose $m\geq 2$ .

(i)

If $z=1$ , then $\mbox{TD}(\mbox{NC}\Pi^{z}_{\infty,m})=\Theta(m)$ and $\mbox{PBTD}(\mbox{NC}\Pi^{z}_{\infty,m})=\Theta(m)$ . 2. (ii)

For any $n\in{\mathbb{N}}_{0}$ , let $\omega(n)$ denote the number of distinct prime factors of $n$ and let $\Pi(n)$ denote the number of prime powers not exceeding $n$ . If $z\geq 2$ , then $\max(\{\omega(n)\mathrel{\mathop{\mathchar 58\relax}}n\leq m\})\leq\mbox{TD}(\mbox{NC}\Pi^{z}_{\infty,m})\leq 2+\Pi(m-1)$ and $\mbox{PBTD}(\mbox{NC}\Pi^{z}_{\infty,m})=\mbox{PBTD}(\mbox{NC}\Pi^{z})=1$ . In particular, $\max(\{\omega(n)\mathrel{\mathop{\mathchar 58\relax}}n\leq m\})\leq\mbox{TD}(\mbox{NC}\Pi^{z}_{\infty,m})<O\left((m-1)^{\frac{1}{2}}\log(m-1)\right)+\displaystyle\frac{1.25506(m-1)}{\log(m-1)}$ .

It is possible that neither the lower bound nor the upper bound on $\mbox{TD}(\mbox{NC}\Pi^{z}_{\infty,m})$ given in Theorem 14 is tight for almost all $m$ . The proof of Theorem 14 (c.f. Appendix N) shows that the TD of any general non-cross pattern $\pi$ w.r.t. $\mbox{NC}\Pi^{z}_{\infty,m}$ (for any fixed $z\geq 2$ and $m\geq 2$ ) is at most $2$ plus the number of maximal proper prime factors of the variable frequencies of $\pi$ , but as the following example shows, this upper bound is not always sharp even for non-cross succinct patterns with three variables; a pattern $\pi$ is succinct [28, 32] iff there is no pattern $\tau$ such that $L(\tau)=L(\pi)$ and $|\tau|<|\pi|$ .

Example 15

Suppose $\{0,1\}\subseteq\Sigma$ . Let $\pi=x_{1}^{4}x_{2}^{8}x_{3}^{9}$ . There are $3$ maximal proper prime power factors of $4,8$ and $9$ , namely, $2,4$ and $3$ , and so by the proof of Theorem 14, the TD of $\pi$ w.r.t. $\mbox{NC}\Pi^{|\Sigma|}_{\infty,9}$ is at most $2+3=5$ . However, $\pi$ has a teaching set of size $4$ (further details are given in Appendix O).

The next result exemplifies the general observation that a larger alphabet allows pattern languages to be distinguished using a relatively smaller number of labelled examples.

Theorem 16

$\mbox{PBTD}(\Pi^{\infty})=2$ * and for any $m\geq 1$ , $\mbox{PBTD}(\Pi^{1}_{\infty,m})=\Theta(m)$ .*

The next series of results deal with the finite distinguishability problem for the general class of $m$ -regular patterns. We begin with a few preparatory results. The first part of Theorem 17 gives a sufficient criterion for the inclusion of pattern languages, and it was observed by Jiang, Kinber, Salomaa and Yu [22]; the second part, due to Ohlebusch and Ukkonen [30], states that the existence of a constant-preserving morphism from $\pi$ to $\tau$ (where $\pi$ and $\tau$ are similar) also implies $L(\tau)\subseteq L(\pi)$ if $\Sigma$ contains at least two letters that do not occur in $\pi$ or $\tau$ . The second result is based on a few lemmas due to Reidenbach [32, Lemmas 4–6], adapted to the case of general patterns over an infinite alphabet.

Theorem 17

[22, 30]** Let $\Sigma$ be an alphabet, and let $\pi,\tau\in\Pi^{|\Sigma|}$ . Then $L(\pi)\subseteq L(\tau)$ if there exists a constant-preserving morphism $g\mathrel{\mathop{\mathchar 58\relax}}(X\cup\Sigma)^{*}\mapsto(X\cup\Sigma)^{*}$ with $g(\tau)=\pi$ . If $|\Sigma|\geq|\mbox{Const}(\pi)|+2,|\Sigma|\geq|\mbox{Const}(\tau)|+2$ and $\pi$ is similar to $\tau$ , then $L(\pi)\subseteq L(\tau)$ only if there exists a constant-preserving morphism $g\mathrel{\mathop{\mathchar 58\relax}}(X\cup\Sigma)^{*}\mapsto(X\cup\Sigma)^{*}$ with $g(\tau)=\pi$ .

Lemma 18

(Based on [32]) Suppose $|\Sigma|=\infty$ . Fix any $\pi\in\Pi^{\infty}$ such that $\pi$ is succinct. Let $Y=\{y_{1},y_{2},\ldots\}$ be an infinite set of variables such that $Y\cap\mbox{Var}(\pi)=\emptyset$ . Suppose $\tau\in\pi\shuffle Y^{*}$ . Then $L(\tau)=L(\pi)$ iff

(i)

*For all $Y^{\prime}\in Y^{+}$ and $\delta,\delta^{\prime}\in\mbox{Const}(\pi)$ , the following hold: (a) $Y^{\prime}\delta$ is not a prefix of $\tau$ , (b) $\delta Y^{\prime}$ is not a suffix of $\tau$ , (c) $\delta Y^{\prime}\delta^{\prime}$ is not a substring of $\tau$ ; * 2. (ii)

There is a constant-preserving morphism $g\mathrel{\mathop{\mathchar 58\relax}}(X\cup\Sigma)^{*}\mapsto(X\cup\Sigma)^{*}$ such that $g(\pi)=\tau$ ; 3. (iii)

For all constant-preserving morphisms $h\mathrel{\mathop{\mathchar 58\relax}}(X\cup\Sigma)^{*}\mapsto(X\cup\Sigma)^{*}$ with $h(\pi)=\tau$ and for all $x\in\mbox{Var}(\pi)$ , if there exist $Y_{1},Y_{2}\in Y^{*}$ such that $Y_{1}xY_{2}$ is a substring of $\tau$ and $Y_{1}$ (resp. $Y_{2}$ ) is not immediately preceded (resp. succeeded) by any $y\in Y$ w.r.t. $\tau$ , then there are splittings $Y^{1}_{1}Y^{2}_{1}$ and $Y^{1}_{2}Y^{2}_{2}$ of $Y_{1}$ and $Y_{2}$ respectively for which $h(x)=Y^{2}_{1}xY^{1}_{2}$ .

The next crucial lemma shows that for any fixed $m\geq 1$ , only finitely many negative examples are needed to distinguish a succinct pattern $\pi$ from all patterns $\pi^{\prime}\in\Pi^{\infty}_{\infty,m}$ obtained by shuffling $\pi$ with an infinite set $Y$ of variables such that $Y$ and $\mbox{Var}(\pi)$ are disjoint.

Lemma 19

Fix $\Sigma$ with $|\Sigma|=\infty$ . Suppose $k\geq 0$ , $m\geq 1$ and $\pi\in\Pi^{\infty}_{k,m}$ . Let $Y=\{y_{1},y_{2},\ldots\}$ be an infinite set of variables such that $Y\cap\mbox{Var}(\pi)=\emptyset$ . Suppose $\tau\in\left(\pi\shuffle Y^{*}\right)\cap\Pi^{\infty}_{\infty,m}$ . There is some $\tau^{\prime}\in\Pi^{\infty}_{4mk+|\pi|+2,m}$ such that $\tau^{\prime}=\tau{\big{|}}_{\Sigma\cup\mbox{Var}(\pi)\cup S}$ for some finite $S\subset Y$ , and if $L(\pi)\subset L(\tau)$ , then $L(\pi)\subset L(\tau^{\prime})$ .

Theorem 20

Suppose $m\geq 1$ .

(i)

$\mbox{TD}(\Pi^{1}_{\infty,m})\leq 2^{m}+m+1$ * and for all $\pi\in\Pi^{\infty}_{k,m}$ with $k\geq 1$ , $\mbox{TD}(\pi,\Pi^{\infty}_{\infty,m})=O((D+1)^{D})$ , where $D\mathrel{\mathop{\mathchar 58\relax}}=(4mk+|\pi|+2)\cdot m$ . * 2. (ii)

Let $1\Pi^{z}_{m}$ denote the class of patterns $\pi$ over any alphabet of size $z$ such that $\pi$ contains at most one variable that occurs more than $m$ times. Suppose $\pi\in 1\Pi^{z}_{m}$ . If $z\geq 4$ , then $\mbox{TD}(\pi,1\Pi^{z}_{m})<\infty$ only if $\pi$ contains a variable that occurs more than $m$ times or $\pi\in\mbox{SR$ \Pi $}^{z}$ . If $z=\infty$ , then $\mbox{TD}(\pi,1\Pi^{z}_{m})<\infty$ if $\pi$ contains a variable that occurs more than $m$ times or $\pi\in\mbox{SR$ \Pi $}^{z}$ .

The next result shows that over binary alphabets, even the class of constant-free $4$ -regular pattern languages contains patterns with infinite TD. We prove this by modifying Reidenbach’s [31] proof of the non-learnability of $x_{1}^{2}x_{2}^{2}x_{3}^{2}$ so that every pattern constructed in the proof has variable frequency at most $4$ .

Theorem 21

(Based on [31, Theorem 5]) Suppose $\pi=x_{1}^{2}x_{2}^{2}x_{3}^{2}$ . For any $m\geq 4$ , $\mbox{TD}(\pi,\Pi^{2}_{\infty,m,cf})=\infty$ .

Remark 22

The lower bound $4$ on $m$ in Theorem 21 is tight in the sense that the TD of $\pi\mathrel{\mathop{\mathchar 58\relax}}=x_{1}^{2}x_{2}^{2}x_{3}^{2}$ w.r.t. $\Pi^{2}_{\infty,3}$ is finite. In fact, $T\mathrel{\mathop{\mathchar 58\relax}}=\{(\varepsilon,+),(0^{2}1^{2}0^{2},+),(0,-),$ $(01^{2}0,$ $-),(0^{3},-),((01)^{2}(0^{2}1)^{2}(0^{3}1)^{2}(0^{4}1)^{2},-)\}$ is a teaching set for $\pi$ w.r.t. $\Pi^{2}_{\infty,3}$ (further details are given in Appendix V).

6 Conclusion

Table 1 summarises some of the main results of this paper. For three types of pattern classes studied – the simple block-regular, $m$ -quasi-regular and $m$ -regular non-cross patterns – it was found that over any alphabet size, every pattern in the class is finitely distinguishable; in the case of simple block-regular and $m$ -regular non-cross patterns, one also has an upper bound on the TD of the class of such patterns that is, depending on the alphabet size, constant, linear or sublinear in $m$ . The most delicate questions appear to be those concerning the $m$ -regular patterns for finite alphabets of size at least $2$ ; we only know that for all $m\geq 4$ , there are patterns in $\Pi^{2}_{\infty,m,cf}$ that are not finitely distinguishable (and even not learnable in the limit). We note that the class of non-cross patterns over any alphabet and the class of all patterns over infinite alphabets are learnable in the limit222This implies that for every pattern $\pi$ belonging to any one of these classes, $L(\pi)$ contains a finite set that distinguishes $\pi$ from all $\pi^{\prime}$ in the class such that $L(\pi^{\prime})\subset L(\pi)$ [4, Theorem 1]. [31, 28], but they have relatively restricted subclasses of finitely distinguishable patterns [7, Theorems 3,10]. Thus the fact that every pattern in the $m$ -regular versions of these classes has a finite TD suggests that the variable frequency of a pattern class may play a role in determining whether any given pattern $\pi$ can be finitely distinguished from all $\pi^{\prime}$ such that $L(\pi^{\prime})\not\subseteq L(\pi)$ . On the other hand, we have seen in Theorem 20(ii) that even constant patterns cannot be finitely distinguished w.r.t. the class of patterns with at most one variable (but no uniform upper bound on the number of variable occurrences). It might be interesting to know whether there is a ‘natural’ class $\Pi$ of $m$ -regular patterns such that $\Pi$ is learnable in the limit but $\mbox{TD}(\pi,\Pi)=\infty$ for some $\pi\in\Pi$ . We also suspect that $\mbox{TD}(\Pi^{\infty}_{\infty,m})=\infty$ for some $m\geq 2$ and $\mbox{TD}(\mbox{QR$ \Pi $}^{z}_{\infty,m})=\infty$ for some finite $z\geq 2$ and $m\geq 1$ , but as yet do not know how to prove this.

Acknowledgements. The author was supported (as RF) by the Singapore Ministry of Education Academic Research Fund grant MOE2016-T2-1-019 / R146-000-234-112. I sincerely thank Fahimeh Bayeh, Sanjay Jain and Sandra Zilles for proofreading the manuscript; their numerous suggestions for corrections and improvements are gratefully acknowledged. I also thank Fahimeh Bayeh very much for her suggestion to look at the PBTD of $m$ -quasi-regular patterns over unary alphabets.

Appendix

This appendix contains the proofs not presented in the main part of the paper as well as additional definitions/notation and examples.

A Additional Definitions and Notation

In this section, we introduce additional definitions and notation needed for the proofs in the appendix.

Given any $x\in X$ , let $\mbox{N}(x,\pi)$ denote the set of all $s\in X\cup\Sigma$ such that $s$ is adjacent to an occurrence of $x$ in $\pi$ ; call $\mbox{N}(x,\pi)$ the neighbourhood of $x$ in $\pi$ . For each $\delta\in\Sigma$ and $w\in(X\cup\Sigma)^{*}$ , $\#(\delta)[w]$ denotes the number of occurrences of $\delta$ in $w$ .

If $|\Sigma|=2$ and $\delta\in\Sigma$ , then $\overline{\delta}$ denotes the unique element of $\Sigma\setminus\{\delta\}$ . For any $\pi\in(X\cup\Sigma)^{+}$ and variables $x_{i_{1}},\ldots,x_{i_{n}}$ occurring in $\pi$ , let $\pi[x_{i_{1}}\rightarrow\alpha_{1},\ldots,x_{i_{n}}\rightarrow\alpha_{n}]$ denote the word obtained from $\pi$ by substituting $\alpha_{j}$ for $x_{i_{j}}$ whenever $j\in\{1,\ldots,n\}$ and substituting $\varepsilon$ for every other variable. We will often assume that a pattern $\pi\in\Pi^{z}$ is normalised in the sense that the $k$ variables occurring in $\pi$ are named $x_{1},\ldots,x_{k}$ in order of their first occurrences from left to right (or $x$ if $k=1$ ).

Given any pattern $\pi$ and substitution $h\mathrel{\mathop{\mathchar 58\relax}}X\mapsto\Sigma^{*}$ , $h$ induces a mapping of closed intervals of positions of $\pi$ to closed intervals of positions of $h(\pi)$ . This mapping will be denoted by ${\mathcal{I}}_{h,\pi}$ . For any position $p$ of $\pi$ , ${\mathcal{I}}_{h,\pi}(\{p\})$ will simply be written as ${\mathcal{I}}_{h,\pi}(p)$ . We define the inverse of ${\mathcal{I}}_{h,\pi}$ , denoted $\overline{{\mathcal{I}}}_{h,\pi}$ , to be the mapping of closed intervals of positions of $h(\pi)$ to closed intervals of positions of $\pi$ such that for all closed intervals $J\subseteq\{1,\ldots,|h(\pi)|\}$ , $\overline{{\mathcal{I}}}_{h,\pi}(J)$ is the smallest closed interval $I\subseteq\{1,\ldots,|\pi|\}$ such that $J\subseteq{\mathcal{I}}_{h,\pi}(I)$ (in other words, $J\subseteq{\mathcal{I}}_{h,\pi}(I)$ and for all $I^{\prime}\subset I$ , $J\not\subseteq{\mathcal{I}}_{h,\pi}(I^{\prime})$ ). For any position $q$ of $h(\pi)$ , $\overline{{\mathcal{I}}}_{h,\pi}(\{q\})$ will be abbreviated to $\overline{{\mathcal{I}}}_{h,\pi}(q)$ .

Fix any $z=|\Sigma|\geq 1$ and $\pi\in\Pi^{z}$ . Suppose that $\gamma\in L(\pi)$ for some $\gamma\in\Sigma^{*}$ , as witnessed by the substitution $h\mathrel{\mathop{\mathchar 58\relax}}X\mapsto\Sigma^{*}$ . We define a cut of $\gamma$ relative to $(h,\pi)$ to be any pair $(I_{1},I_{2})$ of disjoint nonempty closed intervals of positions of $\gamma$ such that $I_{1}=[r_{1},r_{2}]$ and $I_{2}=[r_{2}+1,r_{3}]$ for some $r_{1},r_{2},r_{3}\in\{1,\ldots,|\gamma|\}$ , and there exists $q\in\{1,\ldots,|\pi|\}$ with $\mathcal{I}_{h,\pi}(q)=I_{1}$ and $\mathcal{I}_{h,\pi}(q+1)=I_{2}$ . If $(I_{1},I_{2})$ is a cut of $\gamma$ relative to $(h,\pi)$ , then the right endpoint of $I_{1}$ (which is one less than the left endpoint of $I_{2}$ ) will be called a cut-point of $\gamma$ relative to $(h,\pi)$ . If the choice of $(h,\pi)$ is clear from the context, then $(I_{1},I_{2})$ (resp. the right endpoint of $I_{1}$ ) will simply be called a cut of $\gamma$ (resp. cut-point of $\gamma$ ).

Example A.1

[8]** Let $\pi=x_{1}x_{2}x_{1}x_{2}x_{1}$ and $\gamma=0111011101$ . Then $h\mathrel{\mathop{\mathchar 58\relax}}X\mapsto\Sigma^{*}$ , defined by $h(x_{1})=01$ and $h(x_{2})=11$ , witnesses $\gamma\in L(\pi)$ . One has that

[TABLE]

and $(I_{1},I_{2})$ (where the positions of $\gamma$ occupied by $I_{1}$ and $I_{2}$ are illustrated in Equation (3)) is a cut of $\gamma$ relative to $(h,\pi)$ ; the corresponding cut-point of $\gamma$ relative to $(h,\pi)$ is $2$ .

The following basic lemma elucidates the connection between the number of cuts of $h(\pi)$ and the length of $\pi$ . It will be useful in subsequent results for showing that $L(\pi)$ cannot contain certain words.

Lemma A.2

[8]** If $\gamma$ has $d$ distinct cuts relative to $(h,\pi)$ , then $\left|\pi\right|\geq d+1$ .

Proof. Given any two consecutive cuts $(I_{1},I_{2})$ and $(J_{1},J_{2})$ of $\gamma$ such that the left endpoint of $I_{1}$ is smaller than the left endpoint of $J_{1}$ , $I_{1}\neq I_{2}$ and $I_{2}\neq J_{2}$ together imply that $I_{2}\neq J_{2}$ . Hence $I_{1}$ , $I_{2}$ and $J_{2}$ correspond to three different positions of $\pi$ .

B Example of the Mappings ${\mathcal{I}}$ and $\overline{{\mathcal{I}}}$

Example B.1

[8]** Suppose $\Sigma=\{a,b\}$ and $\pi=x_{1}x_{2}x_{1}x_{2}x_{1}$ . Let $h\mathrel{\mathop{\mathchar 58\relax}}X\mapsto\Sigma^{*}$ be the substitution defined by $h(x_{1})=ab$ and $h(x_{2})=bb$ . Then $\gamma\mathrel{\mathop{\mathchar 58\relax}}=h(\pi)\in L(\pi)$ and one has that ${\mathcal{I}}_{h,\pi}([1,2])=[1,4]$ , ${\mathcal{I}}_{h,\pi}([4,5])=[7,10]$ and $\overline{{\mathcal{I}}}_{h,\pi}(5)=\{3\}$ .

[TABLE]

C Proof of Lemma 5

Proof. Suppose $\pi=x_{1}\delta_{1}x_{2}\delta_{2}\ldots\delta_{n-1}x_{n}$ , where $\delta_{1},\delta_{2},\ldots,\delta_{n-1}\in\Sigma$ . We build a teaching set $T$ for $\pi$ w.r.t. $R\Pi^{2}$ . Let $\tau$ denote any regular pattern that is consistent with $T$ . Let $w_{1}$ be the word obtained from $\pi$ as follows: first, substitute $\overline{\delta}_{1}$ for $x_{1}$ and substitute $\overline{\delta}_{n-1}$ for $x_{n}$ ; second, for every substring of $\pi$ of the shape $\delta x_{i}\delta$ , where $\delta\in\Sigma$ , replace $x_{i}$ with $\overline{\delta}$ ; all other variables are replaced with $\varepsilon$ . Next, let $w_{2}$ be the word obtained from $\pi$ such that for every substring of $\pi$ of the shape $\delta x_{i}\overline{\delta}$ , where $\delta\in\Sigma$ , $x_{i}$ is replaced with $\delta$ ; all other variables are replaced with $\varepsilon$ . Let $\varphi_{1}$ (resp. $\varphi_{2}$ ) be the corresponding substitution witnessing $w_{2}\in L(\tau)$ (resp. $w_{2}\in L(\pi)$ ).

Put $(w_{1},+)$ and $(w_{2},+)$ into $T$ . Since, for every $\delta\in\Sigma$ , $w_{1}$ does not contain the subword $\delta\delta$ while $w_{2}$ does not contain the subword $\delta\overline{\delta}\delta$ , and $w_{1},w_{2}$ both start and end with different letters, $\tau$ must be of the shape $x_{1}A_{1}x_{2}A_{2}\ldots A_{k}x_{k+1}$ , where $k\leq n-1$ and for all $i\in\{1,\ldots,k\}$ , $A_{i}\in\{0,1,01,10\}$ . Thus one may assume, without loss of generality, that $\tau$ is a simple block-regular pattern. For each position $p$ of $w_{2}$ such that $\tau[\overline{{\mathcal{I}}}_{\varphi_{1},\tau}(p)]\in\Sigma$ but $\pi[\overline{{\mathcal{I}}}_{\varphi_{2},\pi}(p)]\in X$ , note that $p\geq 2$ and $w_{2}[p-1]$ must be equal to $w_{2}[p]$ , and since $\tau$ does not contain a substring of the shape $\delta\delta$ for any $\delta\in\Sigma$ (as observed earlier), it follows that $\tau[\overline{{\mathcal{I}}}_{\varphi_{1},\tau}(p-1)]\in X$ . Consequently, $\tau(\varepsilon)\sqsubseteq\pi(\varepsilon)$ . One may then conclude from Lemma 2 that adding $(\widehat{\pi(\varepsilon)},-)$ to $T$ ensures $\tau(\varepsilon)=\pi(\varepsilon)$ . As $\tau$ is simple block-regular, we have that $L(\tau)=L(\pi)$ , as required.

D Example for Lemma 5

We illustrate the construction of the teaching set in the proof of Lemma 5 with the following example.

Example D.1

Suppose $\Sigma=\{0,1\}$ . Let $\pi=x_{1}0x_{2}0x_{3}1x_{4}1x_{5}$ . According to the construction in the proof of Lemma 5, $\pi$ has the teaching set $\{(w_{1},+),(w_{2},+),$ $(w_{3},-)\}$ w.r.t. $R\Pi^{2}$ , where $w_{1},w_{2}$ and $w_{3}$ are defined as follows: ( $\theta_{1}$ and $\theta_{2}$ are substitutions witnessing $w_{1}\in L(\pi)$ and $w_{2}\in L(\pi)$ respectively):

•

$w_{1}=\underbrace{1}_{\theta_{1}(x_{1})}0\underbrace{1}_{\theta_{1}(x_{2})}01\underbrace{0}_{\theta_{1}(x_{4})}1\underbrace{0}_{\theta_{1}(x_{5})}$ ;

•

$w_{2}=00\underbrace{0}_{\theta_{2}(x_{3})}11$ ;

•

$w_{3}=\underbrace{0110}\underbrace{1}$ .

E Proof of Lemma 6

Proof. Suppose $\Sigma=\{a_{1},a_{2},\ldots,a_{k}\}$ , where $k\geq 3$ , and $\pi=x_{1}a_{i_{1}}x_{2}a_{i_{2}}\ldots a_{i_{n-1}}$ $x_{n}$ , where $x_{1},x_{2},\ldots,x_{n}\in X$ . If $n=2$ , then one may verify directly that for any $b\in\Sigma\setminus\{a_{i_{1}}\}$ , $\{(a_{i_{1}},+),$ $(ba_{i_{1}}b,+),(\varepsilon,-)\}$ is a teaching set for $\pi$ w.r.t. $R\Pi^{z}$ . We assume in what follows that $n\geq 3$ . Again, $T=\{(w_{1},+),(w_{2},+),(w_{3},-)\}$ will denote a teaching set for $\pi$ w.r.t. $R\Pi^{z}$ , where $w_{1},w_{2}$ and $w_{3}$ are defined below. Further, $\tau$ will denote a regular pattern that is consistent with $T$ .

$w_{1}$ :

For every substring of $\pi$ of the shape $a_{i_{j}}x_{j+1}a_{i_{j+1}}$ , define $\varphi(x_{j+1})$ according to the following case distinction.

Case i:

$i_{j}$ and $i_{j+1}$ have opposite parities. Set $\varphi(x_{j+1})=\varepsilon$ .

Case ii:

$i_{j}$ and $i_{j+1}$ have equal parities. Fix some $j^{\prime}\in\{1,\ldots,k\}$ such that $j^{\prime}$ and $i_{j}$ have opposite parities (which implies that $j^{\prime}$ and $i_{j+1}$ also have opposite parities), and set $\varphi(x_{j+1})=a_{j^{\prime}}$ . For all other variables $x$ occurring in $\pi$ , set $\varphi(x)=\varepsilon$ .

Set $w_{1}=\varphi(\pi)$ .

$w_{2}$ :

For every substring of $\pi$ of the shape $a_{i_{j}}x_{j+1}a_{i_{j+1}}$ , define $\psi(x_{j+1})$ according to the following case distinction.

Case i:

$i_{j}$ and $i_{j+1}$ have equal parities. Set $\psi(x_{j+1})=\varepsilon$ .

Case ii:

$i_{j}$ is even and $i_{j+1}$ is odd.

Case ii.1:

$j>1$ and $i_{j-1}$ is even. Pick any odd $j^{\prime}\in\{1,\ldots,k\}$ such that $a_{j^{\prime}}\neq a_{j+1}$ , and set $\psi(x_{j+1})=a_{j^{\prime}}$ .

Case ii.2:

$j>1$ and $i_{j-1}$ is odd, or $j=1$ . Pick any even $j^{\prime}\in\{1,\ldots,k\}$ and pick any odd $j^{\prime\prime}\in\{1,\ldots,k\}$ such that $a_{j^{\prime\prime}}\neq a_{i_{j+1}}$ , and set $\psi(x_{j+1})=a_{j^{\prime}}a_{j^{\prime\prime}}$ .

Case iii:

$i_{j}$ is odd and $i_{j+1}$ is even. Pick any odd $j^{\prime}\in\{1,\ldots,k\}$ such that $a_{j^{\prime}}\neq a_{i_{j}}$ , and set $\psi(x_{j+1})=a_{j^{\prime}}$ .

Furthermore, pick $j_{1},j_{2}\in\{1,\ldots,k\}$ such that $a_{j_{1}}\notin\{a_{i_{1}},a_{i_{2}}\}$ and $a_{j_{2}}\notin\{a_{i_{n-1}},a_{i_{n-2}}\}$ ; set $\psi(x_{1})=a_{j_{1}}$ and $\psi(x_{n})=a_{j_{2}}$ .333Such $j_{1}$ and $j_{2}$ must exist since $|\Sigma|\geq 3$ . For all other variables $x$ occurring in $\pi$ , set $\psi(x)=\varepsilon$ . Set $w_{2}=\psi(\pi)$ .

$w_{3}$ :

Arguing as in the proof of Lemma 5, the consistency of $\tau$ with $(w_{1},+)$ and $(w_{2},+)$ implies that $\tau$ is of the shape $x_{1}A_{1}x_{2}A_{2}\ldots A_{k-1}x_{k}$ , where every maximal constant block $A_{i}$ has length at most $2$ ; furthermore, if $A_{i}=a_{\ell}a_{\ell^{\prime}}$ , then $\ell$ and $\ell^{\prime}$ have opposite parities.

Note that Lemma 2 cannot be directly applied here since the consistency of $\tau$ with $(w_{1},+)$ and $(w_{2},+)$ does not imply that $\tau$ is simple block-regular. We will, however, give a different construction of $w_{3}$ by analysing a decomposition of $w_{2}$ containing subwords $\beta_{1},\beta_{2},\ldots,\beta_{n-2}$ such that any maximal constant block of $\tau$ is a subword of some $\beta_{j}$ (details are to follow).

For each $j\in\{1,\ldots,n-2\}$ , define $\beta_{j}\mathrel{\mathop{\mathchar 58\relax}}=a_{i_{j}}\psi(x_{j+1})a_{i_{j+1}}$ . The positions of $\beta_{1},\ldots,\beta_{n-2}$ are illustrated below.

[TABLE]

Corresponding to each $\beta_{j}$ , where $j\in\{1,\ldots,n-2\}$ , we define a word $\alpha_{j}$ based on the following case distinction.

Case i:

$\beta_{j}=a_{i_{j}}a_{i_{j+1}}$ , where $i_{j}$ and $i_{j+1}$ have equal parities.

Case i.1:

$i_{j}$ and $i_{j+1}$ are even.

Case i.1.1:

$j-1\geq 1$ and $i_{j-1}$ is odd, $j+2\leq n-1$ and $i_{j+2}$ is odd. Then $\psi(x_{j})=a_{j^{\prime}}$ for some odd $j^{\prime}$ such that $a_{j^{\prime}}\neq a_{i_{j-1}}$ and $\psi(x_{j+2})=a_{j^{\prime\prime}}$ for some odd $j^{\prime\prime}$ such that $a_{j^{\prime\prime}}\neq a_{i_{j+2}}$ . Set

[TABLE]

Case i.1.2:

$j-1\geq 1$ and $i_{j-1}$ is odd; either $j+2\leq n-1$ and $i_{j+2}$ is even, or $j+2>n-1$ . Then $\psi(x_{j})=a_{j^{\prime}}$ for some odd $j^{\prime}$ such that $a_{j^{\prime}}\neq a_{i_{j-1}}$ . If $j+2\leq n-1$ and $i_{j+2}$ is even, define $\alpha_{j}$ as in Case i.1.1 but with all occurrences of $a_{j^{\prime\prime}}$ deleted. If $j+2>n-1$ , define $\alpha_{j}$ as in Case i.1.1 but with all occurrences of $a_{j^{\prime\prime}}$ replaced with $\psi(x_{n})$ and $\psi(x_{n})$ appended to $\alpha_{j}$ .

Case i.1.3:

$j+2\leq n-1$ and $i_{j+2}$ is odd; either $j-1\geq 1$ and $i_{j-1}$ is even, or $j-1<1$ . Then $\psi(x_{j+2})=a_{j^{\prime\prime}}$ for some odd $j^{\prime\prime}$ such that $a_{j^{\prime\prime}}\neq a_{i_{j+2}}$ . If $j-1\geq 1$ and $i_{j-1}$ is even, define $\alpha_{j}$ as in Case i.1.1 but with all occurrences of $a_{j^{\prime}}$ deleted. If $j-1<1$ , define $\alpha_{j}$ as in Case i.1.1 but with all occurrences of $a_{j^{\prime}}$ replaced with $\psi(x_{1})$ and $\psi(x_{1})$ prepended to $\alpha_{j}$ .

Case i.1.4:

$j-1\geq 1$ and $i_{j-1}$ is even, or $j-1<1$ ; $j+2\leq n-1$ and $i_{j+2}$ is even, or $j+2>n-1$ . If $j-1\geq 1,j+2\leq n-1$ and both $i_{j-1},i_{j+2}$ are even, set

[TABLE]

If $j-1<1$ , set

[TABLE]

If $j+2>n-1$ , set

[TABLE]

Case i.2:

$i_{j}$ and $i_{j+1}$ are odd. If $j-1\geq 1$ and $j+2\leq n-1$ , set

[TABLE]

If $j-1<1$ , set

[TABLE]

If $j+2>n-1$ , set

[TABLE]

Case ii:

$i_{j}$ is odd and $i_{j+1}$ is even.

Case ii.1:

$j+2\leq n-1$ and $i_{j+2}$ is odd; $j-1\geq 1$ and $i_{j-1}$ is even. Suppose $\beta_{j}=a_{i_{j}}a_{j_{1}}a_{i_{j+1}}$ and $\beta_{j+1}=a_{i_{j+1}}a_{j_{2}}a_{j_{3}}a_{i_{j+2}}$ for some even $j_{2}$ and odd $j_{1}$ and $j_{3}$ , where $a_{j_{1}}\neq a_{i_{j}}$ and $a_{j_{3}}\neq a_{i_{j+2}}$ . Set

[TABLE]

Case ii.2:

$j+2\leq n-1$ and $i_{j+2}$ is odd; either $j-1\geq 1$ and $i_{j-1}$ is odd, or $j-1<1$ . If $j-1\geq 1$ and $i_{j-1}$ is odd, define $\alpha_{j}$ as in Case ii.1 (note that $\psi(x_{j})=\varepsilon$ in this case). If $j-1<1$ , define $\alpha_{j}$ as in Case ii.1 but with $\psi(x_{1})$ prepended to $\alpha_{j}$ .

Case ii.3:

$j-1\geq 1$ and $i_{j-1}$ is even; either $j+2\leq n-1$ and $i_{j+2}$ is even, or $j+2>n-1$ . Suppose $\beta_{j}=a_{i_{j}}a_{j_{1}}a_{i_{j+1}}$ for some odd $j_{1}$ such that $a_{j_{1}}\neq a_{i_{j}}$ . If $j+2\leq n-1$ and $i_{j+2}$ is even, set $\alpha_{j}=a_{j_{1}}a_{i_{j+1}}\psi(x_{j})a_{i_{j}}a_{j_{1}}$ . If $j+2>n-1$ , set $\alpha_{j}=a_{j_{1}}a_{i_{j+1}}\psi(x_{n})\psi(x_{j})\psi(x_{n})a_{i_{j}}a_{j_{1}}\psi(x_{n})$ .

Case ii.4:

$j-1\geq 1$ and $i_{j-1}$ is odd, or $j-1<1$ ; $j+2\leq n-1$ and $i_{j+2}$ is even, or $j+2>n-1$ . Suppose $\beta_{j}=a_{i_{j}}a_{j^{\prime}}a_{i_{j+1}}$ , where $j^{\prime}$ is odd and $a_{j^{\prime}}\neq a_{i_{j}}$ . If $j-1\geq 1$ , $i_{j-1}$ is odd, $j+2\leq n-1$ and $i_{j+2}$ is even, set $\alpha_{j}=a_{j^{\prime}}a_{i_{j+1}}a_{i_{j}}a_{j^{\prime}}$ . If $j-1<1$ , set $\alpha_{j}=\psi(x_{1})a_{j^{\prime}}a_{i_{j+1}}\psi(x_{1})a_{i_{j}}a_{j^{\prime}}$ . If $j+2>n-1$ , set $\alpha_{j}=a_{j^{\prime}}a_{i_{j+1}}\psi(x_{n})a_{i_{j}}a_{j^{\prime}}\psi(x_{n})$ .

Case iii:

$i_{j}$ is even and $i_{j+1}$ is odd.

Case iii.1:

$\beta_{j}=a_{i_{j}}a_{j_{1}}a_{j_{2}}a_{i_{j+1}}$ for some even $j_{1}$ and odd $j_{2}$ such that $a_{j_{2}}\neq a_{i_{j+1}}$ . Set

[TABLE]

Case iii.2:

$\beta_{j}=a_{i_{j}}a_{j_{2}}a_{i_{j+1}}$ for some odd $j_{2}$ such that $a_{j_{2}}\neq a_{i_{j+1}}$ (note that if $j-1\geq 1$ , then $i_{j-1}$ is even and so $\psi(x_{j})=\varepsilon$ ). Define $\alpha_{j}$ as in Case iii.1, but with all occurrences of $a_{j_{1}}$ deleted.

Set $w_{3}\mathrel{\mathop{\mathchar 58\relax}}=\alpha_{1}\alpha_{2}\ldots\alpha_{n-2}$ .

By construction, $w_{1}\in L(\pi)$ and $w_{2}\in L(\pi)$ . Furthermore, induction on $j=1,\ldots,n-2$ shows that the longest prefix of $x_{1}a_{i_{1}}x_{2}a_{i_{2}}x_{2}\ldots a_{i_{n-1}}x_{n}$ matching $\alpha_{1}\ldots\alpha_{j}$ is $x_{1}a_{i_{1}}x_{2}\ldots a_{i_{j}}x_{j+1}$ . Hence $w_{3}\notin L(\pi)$ . The lemma will follow from the next two claims.

Claim E.1

Suppose $h,g\mathrel{\mathop{\mathchar 58\relax}}(X\cup\Sigma)^{*}\mapsto\Sigma^{*}$ are constant-preserving morphisms witnessing $w_{1}\in L(\tau)$ and $w_{2}\in L(\tau)$ respectively, and suppose $\pi(\varepsilon)=a_{i_{1}}a_{i_{2}}\ldots$ $a_{i_{n-1}}\sqsubseteq\tau(\varepsilon)$ . Let $\langle p_{1},p_{2},\ldots,$ $p_{n-1}\rangle$ be a sequence of positions of $\tau$ such that $\tau[p_{j}]=a_{i_{j}}$ for all $j\in\{1,\ldots,n-1\}$ . For each $j\in\{1,\ldots,n-1\}$ , let $q_{j}$ be the position of $w_{1}$ occupied by the specific occurrence of $a_{i_{j}}$ indicated with braces in Equation (5).

[TABLE]

Similarly, let $R_{j}$ be the sequence of positions of $w_{2}$ indicated with braces in Equation (6).

[TABLE]

Let $I^{const}_{h,\tau}$ (resp. $I^{const}_{g,\tau}$ ) be the mapping of sequences of positions of constants in $\tau$ to sequences of positions of $w_{1}$ (resp. $w_{2}$ ) induced by $h$ (resp. $g$ ). Then for all $j\in\{1,\ldots,n-1\}$ , $I^{const}_{h,\tau}(\langle p_{j}\rangle)=\langle q_{j}\rangle$ and $I^{const}_{g,\tau}(\langle p_{j}\rangle)$ is a subsequence of $R_{j}$ .

In particular, if $a_{i_{1}}a_{i_{2}}\ldots a_{i_{n-1}}\sqsubseteq\tau(\varepsilon)$ , then $L(\tau)=L(\pi)$ .

Claim E.2

Let $\eta$ be any regular pattern such that $\{w_{1},w_{2}\}\subset L(\eta)$ and $a_{i_{1}}a_{i_{2}}\ldots$ $a_{i_{n-1}}\not\sqsubseteq\eta(\varepsilon)$ . Then $w_{3}\in L(\eta)$ .

Proof of Claim E.1. Let $P_{1},P_{2},\ldots,P_{n-1}$ denote the sequences of positions of $w_{1}$ indicated by braces in Equation (7).

[TABLE]

It suffices to show that whenever $j\in\{1,\ldots,n-1\}$ , $I^{const}_{h,\tau}(\langle p_{j}\rangle)$ is a subsequence of $P_{j}$ ; the claim that $I^{const}_{h,\tau}(\langle p_{j}\rangle)=\langle q_{j}\rangle$ will then follow from the fact that $\varphi(x_{j})\notin N(x_{j},\pi)$ for all $j\in\{1,\ldots,n-1\}$ . So assume, by way of contradiction, that there were a least $\ell\in\{1,\ldots,n-1\}$ such that $I^{const}_{h,\tau}(\langle p_{\ell}\rangle)$ is not a subsequence of $P_{\ell}$ . First, suppose that $I^{const}_{h,\tau}(\langle p_{\ell}\rangle)$ were a subsequence of some $P_{\ell^{\prime}}$ with $\ell^{\prime}<\ell$ . Then, since $\varphi(x_{\ell})\notin\{a_{i_{\ell}},a_{i_{\ell-1}}\}$ and $\varphi(x_{\ell-1})\neq a_{i_{\ell-1}}$ , $I^{const}_{h,\tau}(\langle p_{\ell-1}\rangle)$ is not a subsequence of $P_{\ell-1}$ . Iterating the preceding argument then gives that for all $j\leq\ell$ , $I^{const}_{h,\tau}(\langle p_{j}\rangle)$ is not a subsequence of $P_{j}$ , a contradiction. A similar argument holds if $I^{const}_{h,\tau}(\langle p_{\ell}\rangle)$ were a subsequence of some $P_{\ell^{\prime\prime}}$ with $\ell^{\prime\prime}>\ell$ .

The proof that $I^{const}_{g,\tau}(\langle p_{j}\rangle)$ is a subsequence of $R_{j}$ is similar (making crucial use of the definition of $\psi$ ). This establishes the first part of the claim.

Now we establish the second part of the claim. Note that from the first part of the claim, if $i_{j}$ is odd, then $I^{const}_{g,\tau}(\langle p_{j}\rangle)$ cannot be a subsequence of the sequence of positions of $w_{2}$ corresponding to $\psi(x_{j})$ (resp. $\psi(x_{j+1})$ ). If $i_{j}$ is even, then $I^{const}_{g,\tau}(\langle p_{j}\rangle)$ cannot be a subsequence of the sequence of positions of $w_{2}$ corresponding to $\psi(x_{j})$ . Furthermore, suppose $I^{const}_{g,\tau}(\langle p_{j}\rangle)$ were a subsequence of the sequence of positions of $w_{2}$ corresponding to $\psi(x_{j+1})$ ; then if $j+1\leq n-1$ , $i_{j+1}$ must be odd and therefore $I^{const}_{g,\tau}(\langle p_{j+1}\rangle)$ equals $\langle q^{\prime}\rangle$ , where $q^{\prime}$ is the position of $w_{2}$ occupied by $a_{i_{j+1}}$ in $R_{j+1}$ .

From the fact that $\{w_{1},w_{2}\}\subset L(\tau)$ , we know that $\tau$ must start as well as end with variables. For any $\alpha\in(X\cup\Sigma)^{*}$ , let $o(\alpha)$ denote the number of substrings of $\alpha$ of the shape $b_{1}xb_{2}$ , where $x\in X\cup\{\varepsilon\}$ , $b_{1},b_{2}\in\Sigma$ and $b_{1},b_{2}$ have opposite parities. Note that $o(w_{2})=o(\pi)$ . Since $I^{const}_{h,\tau}(\langle p_{j}\rangle)=\langle q_{j}\rangle$ whenever $j\in\{1,\ldots,n-1\}$ , it follows that if $a_{i_{1}}\ldots a_{i_{n-1}}\sqsubset\tau(\varepsilon)$ , then there is some position $p^{\prime}$ of $\tau$ such that for some $j\in\{1,\ldots,n-2\}$ , $p_{j}<p^{\prime}<p_{j+1}$ and $\tau[p^{\prime}]=\varphi(x_{i_{j+1}})\in\Sigma$ . By the definition of $\varphi$ , if $\varphi(x_{i_{j+1}})=a_{j^{\prime}}$ , then $j^{\prime}$ has parity opposite to that of $i_{j}$ as well as $i_{j+1}$ . Thus $o(\tau)>o(\pi)$ . But $w_{2}\in L(\tau)$ implies $o(\tau)\leq o(\pi)$ , and therefore $\tau(\varepsilon)=a_{i_{1}}\ldots a_{i_{n-1}}$ . The fact that $w_{1}\in L(\tau)$ (resp. $w_{2}\in L(\tau)$ ) implies that a variable occurs in $\tau$ between every pair $a_{i_{j}},a_{i_{j+1}}$ such that $i_{j}$ and $i_{j+1}$ have equal (resp. opposite) parities. Thus $L(\tau)=L(\pi)$ . (Claim E.1)

Proof of Claim E.2. Our strategy to show $w_{3}\in L(\eta)$ is as follows. First, fix some constant-preserving morphism $g\mathrel{\mathop{\mathchar 58\relax}}(X\cup\Sigma)^{*}\mapsto\Sigma^{*}$ such that $g(\eta)=w_{2}$ . Then $g$ induces a mapping $\mathcal{I}_{g,\eta}$ of closed intervals of $\{1,\ldots,|\eta|\}$ to closed intervals of $\{1,\ldots,w_{2}\}$ such that for all $[p_{1},p_{2}]\subseteq\{1,\ldots,|\eta|\}$ , $g(\eta[p_{1}]\ldots\eta[p_{2}])=w_{2}[\mathcal{I}_{g,\eta}([p_{1},p_{2}])]$ . One may take the “inverse” $\overline{{\mathcal{I}}}_{g,\eta}$ of $\mathcal{I}_{g,\eta}$ , where, for all $[q_{1},q_{2}]\subseteq\{1,\ldots,|w_{2}|\}$ , $\overline{{\mathcal{I}}}_{g,\eta}([q_{1},q_{2}])=[s_{1},s_{2}]$ for some $s_{1},s_{2}\in\{1,\ldots,|\eta|\}$ such that $[q_{1},q_{2}]$ is a subinterval of $\mathcal{I}_{g,\eta}([s_{1},s_{2}])$ and for all proper subintervals $R$ of $[s_{1},s_{2}]$ , $[q_{1},q_{2}]$ is not a subinterval of $\mathcal{I}_{g,\eta}(R)$ . Let $r_{1},\ldots,r_{n-1}$ be the positions of $a_{i_{1}},\ldots,a_{i_{n-1}}$ respectively in $w_{2}$ marked with braces in Equation (8).

[TABLE]

By our assumption on $\eta$ , there is a least $\ell\in\{1,\ldots,n-1\}$ such that $\eta[\overline{{\mathcal{I}}}_{g,\eta}([r_{\ell},r_{\ell}])]$ is a variable and if there is a least $r^{\prime}>r_{\ell}$ such that $\eta[\overline{{\mathcal{I}}}_{g,\eta}([r^{\prime},r^{\prime}])]$ is a constant, then $\eta[\overline{{\mathcal{I}}}_{g,\eta}([r^{\prime},r^{\prime}])]\neq a_{i_{\ell}}$ . As argued at the beginning of the construction of $w_{3}$ , $\eta$ starts and ends with variables, and every maximal constant block $A$ of $\eta$ has length at most $2$ ; furthermore, if the length of $A$ is exactly $2$ , then $A=a_{j_{1}}a_{j_{2}}$ for some $j_{1},j_{2}\in\{1,\ldots,k\}$ such that $j_{1}$ and $j_{2}$ have opposite parities.

We define a set ${\mathcal{C}}$ consisting of all possible intervals of positions of $w_{2}$ of length at most $2$ such that for every maximal constant block of $\eta$ , say $\eta[J]$ for some closed interval $J\subseteq\{1,\ldots,|\eta|\}$ , there is an $I\in{\mathcal{C}}$ for which ${\mathcal{I}}_{g,\eta}(J)\subseteq I$ .

First, suppose $i_{\ell}$ is even and the first letter of $\psi(x_{\ell+1})$ equals $a_{i_{\ell}}$ . Then ${\mathcal{C}}$ consists of all intervals of positions of $w_{2}$ of the form

i

$[q,q+1]$ , where $q<r_{\ell}-1$ and $w_{2}[q]w_{2}[q+1]=a_{j_{1}}a_{j_{2}}$ for some $j_{1},j_{2}\in\{1,\ldots,k\}$ with opposite parities, or 2. ii

$[q,q+1]$ , where $q>r_{\ell}+1$ and $w_{2}[q]w_{2}[q+1]=a_{j_{1}}a_{j_{2}}$ for some $j_{1},j_{2}\in\{1,\ldots,k\}$ with opposite parities, or 3. iii

$[q,q]$ , where $q<r_{\ell}-1$ and if $q\geq 2$ , then $w_{2}[q-1]w_{2}[q]w_{2}[q+1]=ba_{j_{3}}a_{j_{4}}$ for some $b\in\Sigma$ and $j_{3},j_{4}\in\{1,\ldots,k\}$ such that $j_{3}$ and $j_{4}$ have equal parities, and $b=a_{j_{5}}$ for some $j_{5}\in\{1,\ldots,k\}$ such that $j_{5}$ and $j_{3}$ have equal parities; if $q<2$ , then the same holds with $w_{2}[q-1]$ and $b$ replaced with $\varepsilon$ , or 4. iv

$[q,q]$ , where $q=r_{\ell}-1$ and if $q\geq 2$ , then $w_{2}[q-1]w_{2}[q]=ba_{j_{6}}$ for some $j_{6}\in\{1,\ldots,k\}$ and $b\in\Sigma$ such that if $b=a_{j_{7}}$ for some $j_{7}\in\{1,\ldots,k\}$ , then $j_{7}$ and $j_{6}$ have equal parities; if $q<2$ , then the same holds with $w_{2}[q-1]$ and $b$ replaced with $\varepsilon$ , or 5. v

$[q,q]$ for some $q>r_{\ell}+2$ such that if $q+1\leq|w_{2}|$ , then $w_{2}[q-1]w_{2}[q]w_{2}[q+1]=a_{j_{8}}a_{j_{9}}b$ for some $j_{8},j_{9}\in\{1,\ldots,k\}$ with equal parities and $b\in\Sigma$ such that if $b=a_{j_{10}}$ for some $j_{10}\in\{1,\ldots,k\}$ , then $j_{10}$ and $j_{9}$ have equal parities; if $q+1>|w_{2}|$ , then the same holds with $w_{2}[q+1]$ and $b$ replaced with $\varepsilon$ , or 6. vi

$[q,q]$ , where $q=r_{\ell}+2$ and if $q+1\leq|w_{2}$ , then $w_{2}[q]w_{2}[q+1]=a_{j_{11}}b$ for some $j_{11}\in\{1,\ldots,k\}$ and $b\in\Sigma$ such that if $b=a_{j_{12}}$ for some $j_{12}\in\{1,\ldots,k\}$ , then $j_{12}$ and $j_{11}$ have equal parities; if $q+1>|w_{2}|$ , then the same holds with $w_{2}[q+1]$ and $b$ replaced with $\varepsilon$ .

Second, suppose either $i_{\ell}$ is odd or the first letter of $\psi(x_{\ell+1})$ is not equal to $a_{i_{\ell}}$ . Then we define ${\mathcal{C}}$ exactly as above but with three differences: first, $q>\ell+1$ is replaced with $q>\ell$ in (ii); second, $q>\ell+2$ is replaced with $q>\ell+1$ in (v); third, $q=\ell+2$ is replaced with $q=\ell+1$ in (vi). We next define a one-one mapping $F$ from ${\mathcal{C}}$ to the set of all intervals of positions of $w_{3}$ satisfying the following conditions for all $[q,q],[q,q+1]\in{\mathcal{C}}$ :

•

$F([q,q+1])=[q^{\prime},q^{\prime}+1]$ for some $q^{\prime}\in\{1,\ldots,|w_{3}|-1\}$ with $w_{2}[q]w_{2}[q+1]=w_{3}[q^{\prime}]w_{3}[q^{\prime}+1]$ .

•

$F([q,q])=[q^{\prime},q^{\prime}]$ for some $q^{\prime}\in\{1,\ldots,|w_{3}|\}$ with $w_{2}[q]=w_{3}[q^{\prime}]$ .

•

Suppose $q_{1}$ and $q_{2}$ are the left endpoints of $I_{1}$ and $I_{2}$ respectively, where $I_{1},I_{2}\in{\mathcal{C}}$ , $I_{1}\neq I_{2}$ and $q_{1}<q_{2}$ (note that no two distinct members of ${\mathcal{C}}$ intersect). Let $q^{\prime}_{1}$ and $q^{\prime}_{2}$ be the left endpoints of $F(I_{1})$ and $F(I_{2})$ respectively. Then $q^{\prime}_{1}<q^{\prime}_{2}$ and $F([I_{1}])\cap F([I_{2}])=\emptyset$ .

Note that the existence of an $F$ satisfying the above three conditions implies that for any sequence $\langle I_{1},I_{2},\ldots,I_{m}\rangle$ of intervals of positions of $w_{2}$ such that every $I_{i}$ corresponds to a maximal constant block of $\eta$ and for all $i,j\in\{1,\ldots,m\}$ with $i<j$ , $I_{i}\cap I_{j}=\emptyset$ , and the left endpoint of $I_{i}$ is smaller than that of $I_{j}$ , there is a corresponding sequence $\langle I^{\prime}_{1},I^{\prime}_{2},\ldots,I^{\prime}_{m}\rangle$ of intervals of positions of $w_{3}$ such that for all $i,j\in\{1,\ldots,m\}$ with $i<j$ , $w_{2}(I_{i})=w_{3}(I^{\prime}_{i})$ , $I^{\prime}_{i}\cap I^{\prime}_{j}=\emptyset$ , and the left endpoint of $I_{i^{\prime}}$ is smaller than that of $I_{j^{\prime}}$ . Thus, since $\eta$ starts as well as ends with variables, the existence of such an $F$ will suffice to show that $w_{3}\in L(\eta)$ . We consider a case distinction based on the earlier definition of ${\mathcal{C}}$ . Let $Q_{1},\ldots,Q_{n-2}$ be the closed intervals of positions of $w_{3}$ corresponding to the occurrences of $\alpha_{1},\ldots,\alpha_{n-2}$ respectively as shown in Equation (9).

[TABLE]

Consider any $I\in{\mathcal{C}}$ .

Case 1:

$I=[r_{j},r_{j}+1]$ for some $j<\ell$ , where, if $w_{2}[r_{j},r_{j}+1]=a_{j^{\prime}}a_{j^{\prime\prime}}$ for some $j^{\prime},j^{\prime\prime}\in\{1,\ldots,k\}$ , then $j^{\prime}$ and $j^{\prime\prime}$ have opposite parities. Note that if $i_{j}$ were odd, then by Cases i and iii in the construction of $w_{2}$ , $w_{2}[r_{j}+1]=a_{j^{\prime}}$ would imply that $j^{\prime}$ is odd, which is impossible by Conditions i and ii in the definition of ${\mathcal{C}}$ . Hence $i_{j}$ is even. Furthermore, an inspection of Cases i and ii in the construction of $w_{2}$ shows that $r_{j}+1\neq r_{j+1}$ , and therefore $r_{j}+1$ is the position of the first letter of $\psi(x_{j+1})$ in $w_{2}$ ; moreover, $i_{j+1}$ is odd. Suppose $\beta_{j}=a_{i_{j}}a_{j_{1}}a_{i_{j+1}}$ for some odd $j_{1}$ such that $a_{j_{1}}\neq a_{i_{j+1}}$ (the positions of $\beta_{1},\ldots,\beta_{n-2}$ are illustrated in Equation (4)). From Case iii.2 in the construction of $w_{3}$ , one sees that $\alpha_{j}=\gamma a_{i_{j}}a_{j_{1}}$ for some $\gamma\in\Sigma^{*}$ ; fix $\gamma$ . Set $F(I)=\left[\sum_{1\leq l<j}|\alpha_{l}|+|\gamma|+1,\sum_{1\leq l<j}|\alpha_{l}|+|\gamma|+2\right]$ .

Case 2:

$I=[r_{j}-1,r_{j}]$ for some $j<\ell$ . First, suppose $j-1\geq 1$ . Then an argument similar to that in Case 1.1 shows that $i_{j}$ must be even and $i_{j-1}$ must be odd. From Cases i.1.1 and iii in the construction of $w_{3}$ , one sees that $\alpha_{j}=\gamma_{1}w_{2}[r_{j}-1]a_{i_{j}}\gamma_{2}$ for some $\gamma_{1},\gamma_{2}\in\Sigma^{*}$ ; fix such $\gamma_{1}$ and $\gamma_{2}$ . Set $F(I)=\left[\sum_{1\leq l<j}|\alpha_{l}|+|\gamma_{1}|+1,\sum_{1\leq l<j}|\alpha_{l}|+|\gamma_{1}|+2\right]$ .

Second, suppose $j-1<1$ . From Cases i.1.3, i.1.4, i.2, ii.4 and iii in the construction of $w_{3}$ , we deduce that there are $\gamma_{1},\gamma_{2}\in\Sigma^{*}$ such that $\alpha_{1}=\gamma_{1}\psi(x_{1})a_{i_{1}}\gamma_{2}$ ; fix such $\gamma_{1}$ and $\gamma_{2}$ . Set $F(I)=\left[|\gamma_{1}|+1,|\gamma_{1}|+2\right]$ .

Case 3:

$I=[r_{j}+1,r_{j}+2]$ for some $j<\ell$ such that $\psi(x_{j+1})=w_{2}[r_{j}+1]w_{2}[r_{j}+2]$ . Based on the case distinction in the construction of $w_{2}$ , one sees that $i_{j}$ must be even and if $j+2\leq n-1$ , then $a_{i_{j+1}}$ must be odd. From Case iii.1, we deduce that $\alpha_{j}=\gamma\psi(x_{j+1})$ for some $\gamma\in\Sigma^{*}$ . Set $F(I)=\left[\sum_{1\leq l<j}|\alpha_{l}|+|\gamma|+1,\sum_{1\leq l<j}|\alpha_{l}|+|\gamma|+2\right]$ .

Case 4:

$I=[r_{j}-1,r_{j}]$ for some $j>\ell$ . Arguing as in the earlier cases, $i_{j}$ must be even and $i_{j-1}$ must be odd. By examining Case ii in the construction of $w_{3}$ , one sees that $\alpha_{j-1}=w_{2}[r_{j}-1]a_{i_{j}}\gamma$ for some $\gamma\in\Sigma^{*}$ . Set $F(I)=\left[\sum_{1\leq l<j-1}|\alpha_{l}|+1,\sum_{1\leq l<j-1}|\alpha_{l}|+2\right]$ .

Case 5:

$I=[r_{j},r_{j}+1]$ for some $j>\ell$ . First, suppose $j+1\leq n-1$ . Arguing as before, $i_{j}$ and $i_{j-1}$ must be even while $i_{j+1}$ must be odd. It follows from Case i.1 in the construction of $w_{3}$ that $\alpha_{j-1}=\gamma_{1}a_{i_{j}}w_{2}[r_{j}+1]\gamma_{2}$ for some $\gamma_{1},\gamma_{2}\in\Sigma^{*}$ ; fix such $\gamma_{1}$ and $\gamma_{2}$ . Set $F(I)=\left[\sum_{1\leq l<j-1}|\alpha_{l}|+|\gamma_{1}|+1,\sum_{1\leq l<j-1}|\alpha_{l}|+|\gamma_{1}|+2\right]$ .

Second, suppose $j+1>n-1$ , i.e. $j=n-1$ . It follows from Cases i.1.2 and i.1.4 in the construction of $w_{3}$ that for some $\gamma_{1},\gamma_{2}\in\Sigma^{*}$ , $\alpha_{n-2}=\gamma_{1}a_{i_{n-1}}\psi(x_{n})\gamma_{2}$ ; fix such $\gamma_{1}$ and $\gamma_{2}$ . Set $F(I)=\left[\sum_{1\leq l<j-1}|\alpha_{l}|+|\gamma_{1}|+1,\right.$ $\left.\sum_{1\leq l<j-1}\right.$ $\left.|\alpha_{l}|+|\gamma_{1}|+2\right]$ .

Case 6:

$I=[r_{j}+1,r_{j}+2]$ for some $j>\ell$ such that $\psi(x_{j+1})=w_{2}[r_{j}+1]w_{2}[r_{j}+2]$ . Based on the case distinction in the construction of $w_{2}$ , we deduce that $i_{j-1}$ and $i_{j+1}$ are odd while $i_{j}$ is even. It follows from Case ii in the construction of $w_{3}$ that $\alpha_{j-1}=\gamma_{1}a_{i_{j}}\gamma_{2}w_{2}[r_{j}+1]w_{2}[r_{j}+2]\gamma_{3}$ for some $\gamma_{1},\gamma_{2},\gamma_{3}\in\Sigma^{*}$ ; fix such $\gamma_{1},\gamma_{2}$ and $\gamma_{3}$ . Set $F(I)=\left[\sum_{1\leq l<j-1}|\alpha_{l}|+|\gamma_{1}|+|\gamma_{2}|+2,\sum_{1\leq l<j-1}\right.$ $\left.|\alpha_{l}|+|\gamma_{1}|+|\gamma_{2}|+3\right]$ .

Case 7:

$I=[r_{j},r_{j}]$ for some $j<\ell$ .

Case 7.1:

$i_{j}$ is even. First, suppose $j-1\geq 1$ . Then both $i_{j-1}$ and $i_{j+1}$ must be even. From Case i.1 in the construction of $w_{3}$ , we deduce that there exist $\gamma_{1},\gamma_{2}\in\Sigma^{*}$ such that $\alpha_{j}=\gamma_{1}a_{i_{j}}\gamma_{2}$ and $|\gamma_{2}|\leq 1$ ; fix such $\gamma_{1}$ and $\gamma_{2}$ . Set $F(I)=\left[\sum_{1\leq l<j}|\alpha_{l}|+|\gamma_{1}|+1,\sum_{1\leq l<j}|\alpha_{l}|+|\gamma_{1}|+1\right]$ .

Second, suppose $j-1<1$ . It follows from Cases i.1 and iii in the construction of $w_{3}$ that there exist $\gamma_{1},\gamma_{2}\in\Sigma^{*}$ such that $\alpha_{1}=\gamma_{1}a_{i_{1}}\gamma_{2}$ and $|\gamma_{2}|\leq 2$ ; fix such $\gamma_{1}$ and $\gamma_{2}$ . Set $F(I)=\left[|\gamma_{1}|+1,|\gamma_{1}|+1\right]$ .

Case 7.2:

$i_{j}$ is odd. It follows from Cases i.2 and ii in the construction of $w_{3}$ that there exist $\gamma_{1},\gamma_{2}\in\Sigma^{*}$ such that $\alpha_{j}=\gamma_{1}a_{i_{j}}\gamma_{2}$ and $|\gamma_{2}|\leq 1$ ; fix such $\gamma_{1}$ and $\gamma_{2}$ . Set $F(I)=\left[\sum_{1\leq l<j}|\alpha_{l}|+|\gamma_{1}|+1,\right.$ $\left.\sum_{1\leq l<j}|\alpha_{l}|+|\gamma_{1}|+1\right]$ .

Case 8:

$I=[r_{j},r_{j}]$ for some $j>\ell$ . From the case distinction in the construction of $w_{3}$ , we deduce that there exist $\gamma_{1},\gamma_{2}\in\Sigma^{*}$ with $|\gamma_{1}|\leq 1$ such that $\alpha_{j-1}=\gamma_{1}a_{i_{j}}\gamma_{2}$ ; fix such $\gamma_{1}$ and $\gamma_{2}$ . Set $F(I)=\left[\sum_{1\leq l<j-1}|\alpha_{l}|+|\gamma_{1}|+1,\sum_{1\leq l<j-1}|\alpha_{l}|\right.$ $\left.+|\gamma_{1}|+1\right]$ .

Case 9:

$I=[1,1]$ . Observe from the construction of $w_{3}$ that $\alpha_{1}$ starts with $\psi(x_{1})$ . Set $F(I)=[1,1]$ .

Case 10:

$I=[|w_{2}|,|w_{2}|]$ . Observe from the construction of $w_{3}$ that $\alpha_{n-2}$ ends with $\psi(x_{n})$ . Set $F(I)=[|w_{3}|,|w_{3}|]$ .

This completes the definition of $F$ . By Claim E.2, since $\{w_{1},w_{2}\}\subset L(\tau)$ and $w_{3}\notin L(\tau)$ , one has that $a_{i_{1}}a_{i_{2}}\ldots a_{i_{n-1}}\sqsubseteq\tau(\varepsilon)$ . Thus by Claim E.1, $L(\tau)=L(\pi)$ . Therefore $T=\{(w_{1},+),(w_{2},+),(w_{3},-)\}$ is indeed a teaching set for $\pi$ w.r.t. $R\Pi^{z}$ .

F Example for Lemma 6

We give an example to illustrate the construction of the teaching set in the proof of Lemma 6.

Example F.1

Suppose $\Sigma=\{0,1,2\}$ . Following the notation of Lemma 6, set $a_{1}=0,a_{2}=1$ and $a_{3}=2$ . Let $\pi=x_{1}0x_{2}1x_{3}2x_{4}1x_{5}1x_{6}$ . According to the construction in the proof of Lemma 6, $\pi$ has the teaching set $\{(w_{1},+),(w_{2},+),(w_{3},$ $-)\}$ w.r.t. $R\Pi^{3}$ , where $w_{1},w_{2}$ and $w_{3}$ are defined as follows ( $\varphi,\psi$ and $\alpha_{i}$ are defined as in the proof of Lemma 6):

•

$w_{1}=0121\underbrace{0}_{\varphi(x_{5})}1$ .

•

$w_{2}=\underbrace{1}_{\psi(x_{1})}0\underbrace{2}_{\psi(x_{2})}1\underbrace{10}_{\psi(x_{3})}2\underbrace{0}_{\psi(x_{4})}11\underbrace{0}_{\psi(x_{6})}$ .

•

$w_{3}=\underbrace{211102}_{\alpha_{1}}\underbrace{0202110}_{\alpha_{2}}\underbrace{011020}_{\alpha_{3}}\underbrace{010}_{\alpha_{4}}$ .

G Proof of Theorem 7

Proof. Note that for any $0\in\Sigma$ , $x_{1}$ has the teaching set $\{(\varepsilon,+),(0,+)\}$ . Now suppose $\pi$ contains at least one constant symbol.

Assertion (i). If $\Sigma=\{0\}$ , then there is some $m\geq 1$ such that $\pi$ is equivalent to the pattern $0^{m}x_{1}$ , and so $\pi$ may be taught with the examples $(0^{m},+),(0^{m+1},+)$ and $(0^{m-1},-)$ .

If $|\Sigma|=\infty$ , then one can choose distinct constants $a_{1},a_{2},\ldots,a_{n}\in\Sigma\setminus\{c_{1},\ldots,c_{n-1}\}$ . Any pattern $\tau$ consistent with the examples $(\pi(\varepsilon),+)$ and $(\pi[x_{1}\rightarrow a_{1},x_{2}\rightarrow a_{2},\ldots,x_{n}\rightarrow a_{n}],+)$ must be simple block-regular and satisfy $\tau(\varepsilon)\sqsubseteq\pi(\varepsilon)$ . By Lemma 2, the example $(\widehat{\pi(\varepsilon)},-)$ will ensure, in addition, that $\tau(\varepsilon)\not\sqsubset\pi(\varepsilon)$ .

Finally, note that any simple block-regular pattern not equivalent to $x_{1}$ must be taught using at least $3$ examples (for a similar proof, see [7, Theorem 12.1].

Assertion (ii). First, suppose $\Sigma=\{a_{1},a_{2},\ldots,a_{\ell}\}$ for some $\ell\geq 3$ . We show that any teaching set for $\pi$ w.r.t. $\Pi^{\ell}$ must contain at least $\left\lfloor\displaystyle\frac{n}{\ell}\right\rfloor$ positive examples. Assume that some teaching set $T$ for $\pi$ w.r.t. $\Pi^{\ell}$ contains $k$ positive examples $(w_{1},+),\ldots,(w_{k},+)$ for some $k\geq 1$ . For each $i\in\{1,\ldots,k\}$ , fix a substitution $h_{i}\mathrel{\mathop{\mathchar 58\relax}}X\mapsto\Sigma^{*}$ such that $h_{i}(\pi)=w_{i}$ . Let $\{z^{i}_{j}\mathrel{\mathop{\mathchar 58\relax}}i,j\in{\mathbb{N}}\}$ be a subset of $X$ such that $z^{i}_{j}\neq z^{i^{\prime}}_{j^{\prime}}$ whenever $(i,j)\neq(i^{\prime},j^{\prime})$ . For each $i\in\{1,\ldots,k\}$ , let $g_{i}\mathrel{\mathop{\mathchar 58\relax}}\Sigma^{*}\mapsto X^{*}$ be a morphism such that $g_{i}(a_{j})=z^{i}_{j}$ for all $j\in\{1,\ldots,\ell\}$ . Let $\pi^{\prime}$ be the pattern derived from $\pi$ by replacing each $x\in\mbox{Var}(\pi)$ with the string $g_{1}(h_{1}(x))g_{2}(h_{2}(x))\ldots g_{k}(h_{k}(x))$ ; $\pi^{\prime}$ can be written in the form $A_{1}c_{1}A_{2}\ldots c_{n-1}A_{n}$ , where $A_{1},A_{2},\ldots,A_{n}\in\{z^{i}_{j}\mathrel{\mathop{\mathchar 58\relax}}1\leq i\leq k\wedge 1\leq j\leq\ell\}^{*}$ . By construction, $w_{i}\in L(\pi^{\prime})$ for all $i\in\{1,\ldots,k\}$ . In particular, note that if $\pi^{\prime}_{i}$ is the restriction of $\pi^{\prime}$ to $\{z^{i}_{1},\ldots,z^{i}_{l}\}\cup\Sigma$ , then $w_{i}\in L(\pi^{\prime}_{i})$ . Furthermore, since $\pi^{\prime}$ is similar to $\pi$ , one has $L(\pi^{\prime})\subseteq L(\pi)$ and so $\pi^{\prime}$ is consistent with $T$ . As $T$ is a teaching set for $\pi$ w.r.t. $\Pi^{\ell}$ , $L(\pi^{\prime})=L(\pi)$ and therefore every $A_{i}$ contains at least one free variable. Hence

[TABLE]

On the other hand, since $\left|\{x\mathrel{\mathop{\mathchar 58\relax}}\mbox{$ x $is a free variable of$ \pi^{\prime} $}\}\right|\subseteq\{z^{i}_{j}\mathrel{\mathop{\mathchar 58\relax}}1\leq i\leq k\wedge 1\leq j\leq\ell\}$ , we have

[TABLE]

It now follows from Equations (10) and (11) that $n\leq\ell k$ , and therefore $k\geq\left\lfloor\displaystyle\frac{n}{\ell}\right\rfloor$ , as required.

The proof for binary alphabets is similar. Suppose $\Sigma=\{0,1\}$ . Define an operation $\mathcal{O}$ on any $\tau\in R\Pi^{2}$ as follows: pick the first occurrence of a substring of $\tau$ of the shape $x\delta x^{\prime}\overline{\delta}x^{\prime\prime}$ , where $x\in X$ and $\delta\in\Sigma$ and delete $x^{\prime}$ . If no such substring occurs in $\tau$ , set $\mathcal{O}(\tau)=\tau$ . Then for all $\tau\in R\Pi^{2}$ , one has $\mathcal{O}(\tau)=\tau^{\prime}$ for some $\tau^{\prime}\in R\Pi^{2}$ with $L(\tau^{\prime})=L(\tau)$ [29, Lemma 2].

We iteratively apply $\mathcal{O}$ to $\pi$ until no new regular pattern is produced; that is to say, we find the least $k$ such that $\mathcal{O}^{k+1}(\pi)=\mathcal{O}^{k}(\pi)$ . Setting $\tau^{\prime}=\mathcal{O}^{k}(\pi)$ , notice that for all $\eta\in\Pi^{2}$ with $\eta$ similar to $\tau^{\prime}$ and $L(\eta)=L(\tau^{\prime})$ , every maximal variable block of $\eta$ must contain a free variable. To see this, let $\eta=A_{1}c_{1}\ldots c_{n-1}A_{n}$ and $\tau^{\prime}=x_{1}c_{1}\ldots c_{n-1}x_{n}$ (after normalisation of $\tau^{\prime}$ ), where $A_{1},\ldots,A_{n}\in X^{*}$ and $c_{1},\ldots,c_{n-1}\in\{0,1,01,10\}$ . Choose some $\delta_{1}\in\Sigma$ that differs from the first symbol of $c_{1}$ , and set $w_{1}=\tau^{\prime}[x_{1}\rightarrow\delta_{1}]$ . Since $L(\eta)=L(\tau^{\prime})$ , we have $w_{1}\in\tau^{\prime}$ and therefore $A_{1}$ must contain a free variable. A similar argument shows that $A_{n}$ contains a free variable. Now consider any $i\in\{2,\ldots,n-1\}$ . If $\mbox{N}(x_{i},\tau^{\prime})=\{\delta\}$ for some $\delta\in\Sigma$ , then setting $w_{i}=\tau^{\prime}[x_{i}\rightarrow\overline{\delta}]$ gives $w_{i}\in L(\tau^{\prime})=L(\eta)$ and so $A_{i}$ must contain a free variable. If $\mbox{N}(x_{i},\tau^{\prime})=\{0,1\}$ , then at least one of $c_{i-1}$ and $c_{i}$ , say $c_{i-1}$ , equals $\delta\overline{\delta}$ for some $\delta\in\Sigma$ . Pick $\delta^{\prime}\in\Sigma$ that differs from the first symbol of $c_{i}$ (if $c_{i}=\delta\overline{\delta}$ instead, let $\delta^{\prime}\in\Sigma$ be a letter that differs from the last symbol of $c_{i-1}$ ). Setting $w_{i}=\tau^{\prime}[x_{i}\rightarrow\delta^{\prime}]$ then gives $w_{i}\in L(\eta)$ , and so $A_{i}$ contains a free variable.

The proof for the case $|\Sigma|\geq 3$ may now be applied to $\tau^{\prime}$ . Note that $|\mbox{Var}(\tau^{\prime})|\geq\left\lfloor\displaystyle\frac{2n}{3}\right\rfloor$ , and so the earlier proof gives that every teaching set for $\tau^{\prime}$ w.r.t. $\Pi^{2}$ must contain at least $\left\lfloor\displaystyle\frac{n}{3}\right\rfloor$ positive examples.

H Example for Theorem 7

Example H.1

We exhibit a family of simple block-regular patterns for which the lower bound given in Theorem 7(ii) is tight (up to numerical constant factors).

Suppose $z=|\Sigma|\geq 2$ and $0,1\in\Sigma$ . For all $n\in{\mathbb{N}}$ , let $\pi_{n}$ be the simple block-regular pattern $x_{1}0x_{2}0\ldots 0x_{n+1}$ ; in particular, $\pi_{n}(\varepsilon)=0^{n}$ . We construct a teaching set $T$ for $\pi_{n}$ w.r.t. $\Pi^{z}$ as follows. Let $\tau$ denote any pattern that is consistent with $T$ . First, put $(0^{n},+)$ into $T$ ; this example ensures that $\tau(\varepsilon)\sqsubseteq 0^{n}$ . Next, for each $k\in\{0,\ldots,n-1\}$ , put $(0^{k},-)$ into $T$ . The examples put into $T$ so far ensure that $\tau(\varepsilon)=0^{n}$ . Now for all $i\in\{1,\ldots,n+1\}$ , put $\left(\pi[x_{i}\rightarrow 1,x_{j}\rightarrow\varepsilon,j\in\{1,\ldots,n+1\}\setminus\{i\}],+\right)$ into $T$ . The last set of examples will ensure that every maximal variable block of $\tau$ contains at least one free variable. Thus $L(\tau)=L(\pi)$ , and this proves that $\pi_{n}$ has a teaching set w.r.t. $\Pi^{z}$ of size $O(n)$ .

I Proof of Lemma 9

Proof. Given that $L(\tau)\not\subseteq L(\pi)$ and $\tau(\varepsilon)=\pi(\varepsilon)$ , both $\tau$ and $\pi$ contain at least one variable, and so there is some $S\subseteq\mbox{Var}(\tau)$ of minimum possible size such that $L\left(\tau{\big{|}}_{\Sigma\cup S}\right)\not\subseteq L(\pi)$ . Fix such an $S$ . By the choice of $S$ , one has $L\left(\tau{\big{|}}_{\Sigma\cup(S\setminus\{y^{\prime}\})}\right)\subseteq L(\pi)$ for all $y^{\prime}\in S$ . Fix any $y\in S$ , and set $S^{\prime}\mathrel{\mathop{\mathchar 58\relax}}=S\setminus\{y\}$ . Without loss of generality, assume $S^{\prime}=\{x_{1},\ldots,x_{\ell}\}$ ( $S^{\prime}$ may also be empty). As noted earlier, $L\left(\tau{\big{|}}_{\Sigma\cup S^{\prime}}\right)\subseteq L(\pi)$ .

Now suppose, by way of contradiction, that $|S|>1+\left(|\pi(\varepsilon)|+m+4\right)\cdot|\mbox{Var}(\pi)|$ . Let $\varphi\mathrel{\mathop{\mathchar 58\relax}}X\mapsto\Sigma^{*}$ be the substitution defined by $\varphi(x_{i})=01^{2i\cdot|\tau|}0$ for all $i\in\{1,\ldots,\ell\}$ and $\varphi(z)=\varepsilon$ for all $z\in X\setminus\{x_{1},\ldots,x_{\ell}\}$ . Set $w\mathrel{\mathop{\mathchar 58\relax}}=\varphi(\tau)$ . We first establish the following claim.

Claim I.1

For all $i\in\{1,\ldots,\ell\}$ , $w$ contains exactly $m$ occurrences of $\varphi(x_{i})=01^{2i\cdot|\tau|}0$ . Furthermore, all $m$ occurrences of $\varphi(x_{i})$ are disjoint.

Proof of Claim I.1. Fix any $i\in\{1,\ldots,\ell\}$ . Since $\tau\in\mbox{QR$ \Pi $}^{z}_{\infty,m}$ , there are at least $m$ occurrences of $\varphi(x_{i})$ in $w$ . We show that there cannot be any occurrence of $\varphi(x_{i})$ that overlaps with (i) a constant part of $\tau$ , or (ii) an occurrence of $\varphi(x_{j})$ for some $j\in\{1,\ldots,\ell\}$ such that $\varphi(x_{j})$ and $\varphi(x_{i})$ occupy different intervals of positions of $w$ .

Assume otherwise. Consider any $j\in\{1,\ldots,\ell\}$ . Since $\varphi(x_{j})$ starts and ends with [math], the occurrences of $\varphi(x_{j})$ and $\varphi(x_{i})$ coincide or $\varphi(x_{j})$ overlaps with $\varphi(x_{i})$ only at the first or last position of $\varphi(x_{i})$ .

First, suppose an occurrence of $\varphi(x_{i})$ overlaps with a constant part of $\tau$ . Since $|01^{2i\cdot|\tau|}0|>|\tau|$ , this occurrence of $\varphi(x_{i})$ must overlap with an occurrence of $\varphi(x_{j})$ that is generated by a variable of $\tau$ for some $j\in\{1,\ldots,\ell\}$ . By the observation in the preceding paragraph, since the occurrences of $\varphi(x_{j})$ and $\varphi(x_{i})$ must be different, $\varphi(x_{j})$ can overlap with $\varphi(x_{i})$ only at the first or last position of $\varphi(x_{i})$ . It follows that each of the $2i\cdot|\tau|$ occurrences of $1$ in $\varphi(x_{i})$ must overlap with a constant part of $\tau$ , which is impossible as $2i\cdot|\tau|>|\tau|$ .

Second, suppose an occurrence of $\varphi(x_{i})$ overlaps with an occurrence of $\varphi(x_{j})$ for some $j\in\{1,\ldots,\ell\}$ such that $\varphi(x_{j})$ and $\varphi(x_{i})$ occupy different intervals of positions of $w$ . An argument similar to that in the preceding paragraph shows that each of the $2i\cdot|\tau|$ occurrences of $1$ in $\varphi(x_{i})$ must overlap with a constant part of $\tau$ , which is impossible. (Claim I.1)

Let the variable part of $\tau{\big{|}}_{\Sigma\cup S^{\prime}}$ (i.e. $\tau{\big{|}}_{S^{\prime}}$ ) be $x_{i_{1}}\ldots x_{i_{m\ell}}$ (since $\tau{\big{|}}_{\Sigma\cup S^{\prime}}$ has $\ell$ distinct variables, it has $m\ell$ variable occurrences). Set $c=|\pi(\varepsilon)|$ , and write $w$ as

[TABLE]

where $\gamma_{1},\ldots,\gamma_{m\ell+1}\in\Sigma^{*}$ , $\tau(\varepsilon)=\gamma_{1}\gamma_{2}\ldots\gamma_{m\ell+1}$ and $J_{1},H_{1},\ldots,J_{m\ell},H_{m\ell},J_{m\ell+1}$ are the intervals of positions of $w$ corresponding to the subwords marked in Equation (12). Since $L\left(\tau{\big{|}}_{\Sigma\cup S^{\prime}}\right)\subseteq L(\pi)$ , there is a morphism $\theta\mathrel{\mathop{\mathchar 58\relax}}X^{*}\mapsto X^{*}$ such that $\theta(\pi)=w$ .

We claim that for all $j\in\{1,\ldots,m\ell-m-c-3\}$ , ${\mathcal{I}}_{\theta,\pi}$ maps the positions of at least two variable occurrences of $\pi$ to intervals of positions of $w$ that overlap with the interval corresponding to

[TABLE]

Formally, this means there are at least two positions of $\pi$ occupied by variables, say $p_{1}$ and $p_{2}$ , such that

[TABLE]

for $k\in\{1,2\}$ . Suppose the latter statement does not hold. For all $i\in\{1,\ldots,\ell\}$ , since $|\varphi(x_{i})|>|\tau|\geq|S|+c>m\cdot|\mbox{Var}(\pi)|+c=|\pi|$ , no constant part of $\pi$ can cover $\varphi(x_{i})$ , and so there must be some $q\in\{1,\ldots,|\pi|\}$ such that $\pi[q]$ is a variable and ${\mathcal{I}}_{\theta,\pi}(q)$ covers $J_{j+1}\cup H_{j+1}\cup\ldots\cup H_{j+c+m+2}\cup J_{j+c+m+3}$ , i.e.

[TABLE]

Since every variable of $\pi$ occurs exactly $m$ times, there must be at least $m$ occurrences of

[TABLE]

in $w$ . According to Claim I.1, $\varphi(x_{i})$ occurs exactly $m$ times in $w$ for all $i\in\{1,\ldots,\ell\}$ , and all its $m$ occurrences are disjoint. Thus for all distinct $j_{1},j_{2}\in\{j+1,\ldots,j+c+m+2\}$ , $i_{j_{1}}\neq i_{j_{2}}$ . Furthermore, since there are at most $c$ indices $i$ with $\gamma_{i}\neq\varepsilon$ , $w^{\prime}$ contains at least $c+m+1-c=m+1$ subwords of the shape $\varphi(x_{i_{j_{1}}})\varphi(x_{i_{j_{2}}})$ , where $j_{1}\neq j_{2}$ and $j_{1},j_{2}\in\{1,\ldots,\ell\}$ . This means there are at least $m+1$ pairs $(j_{1},j_{2})$ with $j_{1}\neq j_{2}$ and $j_{1},j_{2}\in\{1,\ldots,\ell\}$ such that $\tau{\big{|}}_{\Sigma\cup S^{\prime}}$ contains exactly $m$ occurrences of the substring $x_{j_{1}}x_{j_{2}}$ . Since $y$ occurs exactly $m$ times in $\tau{\big{|}}_{\Sigma\cup S}$ (we recall that $S=S^{\prime}\cup\{y\}$ ), there is at least one pair $(k_{1},k_{2})$ with $k_{1}\neq k_{2}$ and $k_{1},k_{2}\in\{1,\ldots,\ell\}$ such that $x_{k_{1}}x_{k_{2}}$ occurs exactly $m$ times in $\tau{\big{|}}_{\Sigma\cup S}$ . But by Theorem 17, $\tau{\big{|}}_{\Sigma\cup S}$ would then be equivalent to $\tau{\big{|}}_{\Sigma\cup(S\setminus\{x_{k_{2}}\})}$ , contradicting the minimality of $|S|$ . Thus there are indeed at least $2$ positions of variables in $\pi$ , say $p_{1}$ and $p_{2}$ , such that (13) holds.

Arguing inductively, it follows that the number of variable occurrences of $\pi$ (including variable repetitions) is at least $\displaystyle\frac{m\ell}{c+m+4}$ . Consequently,

[TABLE]

and so

[TABLE]

as desired.

J Proof of Theorem 10

Proof. We first consider the case $z=1$ . Suppose $\Sigma=\{0\}$ . Every language generated by a pattern in $\mbox{QR$ \Pi $}^{1}_{\infty,m}$ is equivalent to a pattern of the shape $0^{k}x^{m}$ or $0^{k^{\prime}}$ , where $k\in{\mathbb{N}}_{0}$ and $k^{\prime}\in{\mathbb{N}}$ . Let $\pi\mathrel{\mathop{\mathchar 58\relax}}=0^{k}x^{m}$ . If $k\geq m$ , then $\pi$ can be taught using the sample $\{(0^{k},+),(0^{k+m},+)(0^{k-m},-)\}$ : the two examples $(0^{k},+)$ and $(0^{k-m},-)$ uniquely identify the constant part of $\pi$ , while $(0^{k+m},+)$ distinguishes $\pi$ from the constant pattern $0^{k}$ . If $k<m$ , then $\{(0^{k},+),(0^{k+m},+)\}$ is a teaching set for $\pi$ : since $k<m$ , $(0^{k},+)$ already uniquely identifies the constant part of $\pi$ , while as before $(0^{k+m},+)$ ensures that $\pi$ is not a constant pattern. Let $\pi^{\prime}\mathrel{\mathop{\mathchar 58\relax}}=0^{k^{\prime}}$ . Then $\{(0^{k^{\prime}},+),(0^{k^{\prime}+m},-)\}$ is a teaching set for $\pi^{\prime}$ : the constant part of any pattern $\tau$ consistent with $(0^{k^{\prime}},+)$ is equal to $0^{k^{\prime\prime}}$ for some $k^{\prime\prime}\leq k^{\prime}$ ; if $L(\tau)\neq L(\pi)$ , then $\tau$ contains a variable $x$ such that for some $i\geq 1$ with $k^{\prime\prime}+mi=k^{\prime}$ , $0^{k^{\prime}}$ is obtained from $\tau$ by substituting $0^{i}$ for $x$ . Replacing $x$ with $0^{i+1}$ yields $0^{k^{\prime}+m}\in L(\tau)$ , and so $\tau$ is inconsistent with $(0^{k^{\prime}+m},-)$ . In any one of the above cases, one has $\mbox{TD}(\pi,\mbox{QR$ \Pi $}^{1}_{\infty,m})\leq 3$ . Furthermore, suppose $\eta\mathrel{\mathop{\mathchar 58\relax}}=0^{m}x^{m}$ . Any teaching set for $\eta$ must contain at least one positive and one negative example since $L(0^{m})\subset L(\eta)$ and $L(\eta)\subset L(x^{m})$ ; an additional positive example is needed to distinguish $\eta$ from all constant patterns. Hence $\mbox{TD}(\eta,\mbox{QR$ \Pi $}^{1}_{\infty,m})\geq 3$ .

Now suppose $z\geq 2$ . Fix any $\pi\in\mbox{QR$ \Pi $}^{z}_{k,m}$ . We build a teaching set $T$ for $\pi$ w.r.t. $\mbox{QR$ \Pi $}^{z}_{\infty,m}$ . Let $\eta$ denote any pattern in $\mbox{QR$ \Pi $}^{z}_{\infty,m}$ that is consistent with $T$ . First, put $(\pi(\varepsilon),+)$ into $T$ . Next, for every $w\sqsubset\pi(\varepsilon)$ , put $(w,-)$ into $T$ . The $O(2^{|\pi(\varepsilon)|})$ examples added to $T$ up to the present stage ensure that $\eta(\varepsilon)=\pi(\varepsilon)$ . By [28], there is a finite tell-tale set for $\pi$ w.r.t. $\mbox{QR$ \Pi $}^{z}_{\infty,m}$ , that is, a finite set $S\subseteq L(\pi)$ such that for all $\tau\in\mbox{QR$ \Pi $}^{z}_{\infty,m}$ , one has $S\subseteq L(\tau)\subseteq L(\pi)\Rightarrow L(\tau)=L(\pi)$ ; furthermore, [28, Lemma 9] implies that this set $S$ has size $O(\lceil D_{1}\cdot(|\pi(\varepsilon)|+D_{1}\cdot m)^{D_{1}\cdot m}\rceil)$ , where $D_{1}\mathrel{\mathop{\mathchar 58\relax}}=(1/m)\cdot(2|\pi|-|\pi(\varepsilon)|)$ . Put $\{(w^{\prime},+)\mathrel{\mathop{\mathchar 58\relax}}w^{\prime}\in S\}$ into $T$ . The examples in $T$ now ensure that $\eta(\varepsilon)=\pi(\varepsilon)$ and $L(\eta)\not\subset L(\pi)$ . Thus if $L(\eta)\neq L(\pi)$ , then $L(\eta)\not\subseteq L(\pi)$ . Next, for each $\tau\in\mbox{QR$ \Pi $}^{z}_{1+\left(|\pi(\varepsilon)|+m+4\right)\cdot\left|\mbox{Var}(\pi)\right|,m}$ such that $L(\tau)\not\subseteq L(\pi)$ and $\tau(\varepsilon)=\pi(\varepsilon)$ , pick some $v_{\tau}\in L(\tau)\setminus L(\pi)$ and put $(v_{\tau},-)$ into $T$ ; note that there are $O(D_{2}\cdot(|\pi(\varepsilon)|+D_{2}\cdot m)^{D_{2}\cdot m})$ many such $\tau$ (up to equivalence), where $D_{2}\mathrel{\mathop{\mathchar 58\relax}}=1+(|\pi(\varepsilon)|+m+4)\cdot|\mbox{Var}(\pi)|$ . As was observed earlier, if $L(\eta)\neq L(\pi)$ , then $L(\eta)\not\subseteq L(\pi)$ , and so by Lemma 9, $\eta(\varepsilon)=\pi(\varepsilon)$ implies there is some $\tau^{\prime}\in\mbox{QR$ \Pi $}^{z}_{1+\left(|\pi(\varepsilon)|+m+4\right)\cdot\left|\mbox{Var}(\pi)\right|,m}$ with $L(\tau^{\prime})\subseteq L(\eta)$ and $L(\tau^{\prime})\not\subseteq L(\pi)$ ; the negative example $(v_{\tau^{\prime}},-)$ would therefore ensure that $\eta$ is inconsistent with $T$ . At this stage, $T$ has altogether $O(2^{|\pi(\varepsilon)|}+D\cdot(|\pi(\varepsilon)|+D\cdot m)^{D\cdot m})$ examples, where $D\mathrel{\mathop{\mathchar 58\relax}}=\max(\{(1/m)\cdot(2\cdot|\pi|-|\pi(\varepsilon)|),1+(|\pi(\varepsilon)|+m+4)\cdot|\mbox{Var}(\pi)|\})$ .

K Proof of Theorem 11

We first observe a basic fact about graph colourings. We recall that for any finite, simple graph $G=(V,E)$ , the distance between any two vertices $u$ and $v$ , denoted $d_{G}(u,v)$ , is the length of a shortest path in $G$ from $u$ to $v$ (or vice-versa; if no such path exists, then $d_{G}(u,v)=\infty$ ), and for any $\ell\geq 1$ , the $\ell$ -distance chromatic number of $G$ , denoted $\chi_{\ell}(G)$ , is the smallest $k$ for which there exists a $k$ -colouring of $G$ such that for any pair of vertices $s,t$ of $G$ with $d_{G}(s,t)\leq\ell$ , $s$ and $t$ receive distinct colours; such a colouring is called an $\ell$ -distance colouring of $G$ [21, 24].

Lemma K.1

Let $G=(V,E)$ be any finite, simple graph with vertex set $V$ , edge set $E$ and maximum degree $\Delta(G)$ . Then $\chi_{2}(G)\leq\Delta(G)^{2}+1$ ; equality occurs if $G$ is the $5$ -cycle.

Proof. We note that $\chi_{2}(G)$ is equal to $\chi_{1}(G^{2})$ , the (ordinary) chromatic number of the square of $G$ ; $G^{2}$ is the graph whose vertex set is equal to that of $G$ and for all distinct vertices $v_{1},v_{2}$ of $G$ , $(v_{1},v_{2})$ is an edge of $G^{2}$ iff $d_{G}(v_{1},v_{2})\leq 2$ . The maximum degree of any vertex of $G^{2}$ is at most $\Delta(G)+\Delta(G)\cdot(\Delta(G)-1)=\Delta(G)^{2}$ , and so by Brook’s theorem [21, Theorem 11], $\chi_{1}(G^{2})\leq\Delta(G)^{2}+1$ ; equality occurs if $G$ is the $5$ -cycle. (Lemma K.1)

Proof of Theorem 11. If $m=1$ , then $\mbox{QR$ \Pi $}^{z}_{\infty,m,cf}$ contains only the pattern $x$ and so $\mbox{TD}(\mbox{QR$ \Pi $}^{z}_{\infty,1,cf})=\mbox{PBTD}(\mbox{QR$ \Pi $}^{z}_{\infty,1,cf})$ $=0$ . Suppose $m\geq 2$ . Given $\pi,\tau\in\mbox{QR$ \Pi $}^{z}_{\infty,m,cf}$ that are succinct, define $\pi\prec\tau$ iff $|\tau|<|\pi|$ . For any succinct pattern $\pi\in\mbox{QR$ \Pi $}^{z}_{\infty,m,cf}$ with $\mbox{Var}(\pi)=\{x_{1},\ldots,x_{n}\}$ , define the adjacency graph of $\pi$ , denoted $\mbox{AG}(\pi)$ , to be the bipartite graph whose vertex set comprises two copies of $\mbox{Var}(\pi)$ , one denoted $\mbox{Var}(\pi)^{L}\mathrel{\mathop{\mathchar 58\relax}}=\{x_{1}^{L},\ldots,x_{n}^{L}\}$ and the other denoted $\mbox{Var}(\pi)^{R}\mathrel{\mathop{\mathchar 58\relax}}=\{x_{1}^{R},\ldots,x_{n}^{R}\}$ , such that an edge connects $x_{i}^{L}$ and $x_{j}^{R}$ iff $x_{i}x_{j}$ is a substring of $\pi$ [26, Chapter 3]. We find the least $k$ such that some $k$ -colouring $c\mathrel{\mathop{\mathchar 58\relax}}\mbox{Var}(\pi)^{L}\cup\mbox{Var}(\pi)^{R}\mapsto\{1,\ldots,k\}$ of $\mbox{AG}(\pi)$ satisfies the following conditions.

For all $i\in\{1,\ldots,n\}$ , $c(x_{i}^{L})=c(x_{i}^{R})$ . 2. 2.

For any distinct $j_{1},j_{2}\in\{1,\ldots,n\}$ , if $(x_{i}^{L},x_{j_{1}}^{R})\in E(\mbox{AG}(\pi))$ and $(x_{i}^{L},x_{j_{2}}^{R})\in E(\mbox{AG}(\pi))$ (resp. $(x_{j_{1}}^{L},x_{i}^{R})\in E(\mbox{AG}(\pi))$ and $(x_{j_{2}}^{L},x_{i}^{R})\in E(\mbox{AG}(\pi))$ ), then $c(x_{j_{1}}^{R})\neq c(x_{j_{2}}^{R})$ (resp. $c(x_{j_{1}}^{L})\neq c(x_{j_{2}}^{L})$ ).

We show that $k\leq 4m^{2}+1$ . Let $G$ be the graph obtained from $\mbox{AG}(\pi)$ by contracting the pair $(x_{i}^{L},x_{i}^{R})$ of vertices for all $i\in\{1,\ldots,n\}$ (i.e. the vertices $x_{i}^{L}$ and $x_{i}^{R}$ are replaced with a single vertex $x_{i}$ such that $x_{i}$ is adjacent to any vertex to which $x_{i}^{L}$ and $x_{i}^{R}$ were originally adjacent) and deleting all loops. Choose the minimum $k^{\prime}$ such that some colouring $c^{\prime}\mathrel{\mathop{\mathchar 58\relax}}V(G)\mapsto\{1,\ldots,k^{\prime}\}$ is a $2$ -distance colouring of $G$ . Let $c^{\prime\prime}\mathrel{\mathop{\mathchar 58\relax}}\mbox{Var}(\pi)^{L}\cup\mbox{Var}(\pi)^{R}\mapsto\{1,\ldots,k^{\prime}\}$ be the colouring of $\mbox{AG}(\pi)$ defined by $c^{\prime\prime}(x_{i}^{L})=c^{\prime\prime}(x_{i}^{R})=c^{\prime}(x_{i})$ for all $i\in\{1,\ldots,n\}$ . Note that for any distinct $j_{1},j_{2}\in\{1,\ldots,n\}$ , $(x_{i}^{L},x_{j_{1}}^{R})\in E(\mbox{AG}(\pi))$ and $(x_{i}^{L},x_{j_{2}}^{R})\in E(\mbox{AG}(\pi))$ (resp. $(x_{j_{1}}^{L},x_{i}^{R})\in E(\mbox{AG}(\pi))$ and $(x_{j_{2}}^{L},x_{i}^{R})\in E(\mbox{AG}(\pi))$ ) together imply that $d_{G}(x_{j_{1}},x_{j_{2}})\leq d_{\mbox{AG}(\pi)}(x_{j_{1}}^{R},x_{j_{2}}^{R})\leq 2$ (resp. $d_{G}(x_{j_{1}},x_{j_{2}})\leq d_{\mbox{AG}(\pi)}(x_{j_{1}}^{L},x_{j_{2}}^{L})\leq 2$ ); hence $c^{\prime\prime}$ satisfies Conditions 1 and 2 with $c^{\prime\prime}$ in place of $c$ , and therefore $k\leq k^{\prime}$ . Furthermore, $\Delta(G)$ is equal to the maximum, over all $i\in\{1,\ldots,n\}$ , of the number of substrings of $\pi$ of the shape $x_{j}x_{i}$ or $x_{i}x_{j^{\prime}}$ (where $j\neq i$ and $j^{\prime}\neq i$ ); this is bounded above by $2m$ because every variable of $\pi$ occurs exactly $m$ times. Thus by Lemma K.1, $k\leq k^{\prime}\leq 4m^{2}+1$ .

Fix distinct letters $a_{1},\ldots,a_{k}\in\Sigma$ and any strictly increasing sequence $2<p_{1}<\ldots<p_{n}$ of positive integers. For each $i\in\{1,\ldots,n\}$ , fix some $\xi_{i}\in\{1,\ldots,k\}$ such that $\xi_{i}\neq c(x_{i})$ . Let $\varphi\mathrel{\mathop{\mathchar 58\relax}}X\mapsto\Sigma^{*}$ be the substitution defined by $\varphi(x_{i})=a_{c(x_{i})}a_{\xi_{i}}^{p_{i}}a_{c(x_{i})}$ for all $i\in\{1,\ldots,n\}$ and $\varphi(x^{\prime})=\varepsilon$ for all $x^{\prime}\in X\setminus\mbox{Var}(\pi)$ . Set $w\mathrel{\mathop{\mathchar 58\relax}}=\varphi(\pi)$ . Thus if $\pi=x_{l_{1}}x_{l_{2}}\ldots x_{l_{n^{\prime}}}$ ,

[TABLE]

Let $\tau$ be any succinct pattern in $\mbox{QR$ \Pi $}^{z}_{\infty,m,cf}$ such that $w\in L(\tau)$ and $\tau\not\prec\pi$ . It will be argued that $L(\tau)=L(\pi)$ . Suppose $\psi\mathrel{\mathop{\mathchar 58\relax}}X^{*}\mapsto\Sigma^{*}$ is a morphism witnessing $w\in L(\tau)$ . Let $I_{1},\ldots,I_{n^{\prime}}$ be the closed intervals corresponding to the positions of the subwords of $w$ marked with braces in (15). We show that for each $j\in\{1,\ldots,n^{\prime}\}$ , there is some $j^{\prime}\in\{1,\ldots,|\tau|\}$ such that $I_{j}\subseteq{\mathcal{I}}_{\psi,\tau}(j^{\prime})$ , i.e. there is a single position of $\tau$ that is mapped under $\psi$ to a subword of $w$ covering $a_{c(x_{l_{j}})}a_{\xi_{l_{j}}}^{p_{l_{j}}}a_{c(x_{l_{j}})}$ . Assume otherwise; let $i_{0}\in\{1,\ldots,n^{\prime}\}$ be the least integer for which the latter statement is false. It follows that $I_{i_{0}}$ contains a cut-point of $w$ relative to $(\psi,\tau)$ . Further, one observes that ${\mathcal{I}}_{\psi,\tau}$ cannot map any single position of $\tau$ to a proper superset of $I_{i}$ for any given $i\in\{1,\ldots,n^{\prime}\}$ :

Claim K.2

Fix any $x\in\mbox{Var}(\tau)$ . For all $i\in\{1,\ldots,n\}$ and $j\in\{1,\ldots,k\}$ , neither $a_{j}a_{c(x_{i})}a_{\xi_{i}}^{p_{i}}a_{c(x_{i})}$ nor $a_{c(x_{i})}a_{\xi_{i}}^{p_{i}}a_{c(x_{i})}a_{j}$ is a subword of $\psi(x)$ .

Proof of Claim K.2. Suppose, by way of contradiction, that $a_{j}a_{c(x_{i})}a_{\xi_{i}}^{p_{i}}a_{c(x_{i})}$ were a subword of $\psi(x)$ for some $x\in\mbox{Var}(\tau)$ . Since $a_{c(x_{i})}\neq a_{\xi_{i}}$ (by the choice of $\xi_{i}$ ), $p_{i}\geq 3$ and $p_{i^{\prime}}\neq p_{j^{\prime}}$ for all distinct $i^{\prime},j^{\prime}\in\{1,\ldots,n\}$ , there are exactly $m$ (non-overlapping) occurrences of the word $a_{c(x_{i})}a_{\xi_{i}}^{p_{i}}a_{c(x_{i})}$ in $w$ . Suppose these occurrences are represented by the intervals $I_{j_{1}},\ldots,I_{j_{m}}$ of positions of $w$ , where $j_{1}<\ldots<j_{m}$ . Hence if $x$ occupies positions $q_{1},\ldots,q_{m}$ of $\tau$ , where $q_{1}<\ldots<q_{m}$ , then $I_{j_{\ell}}\subset{\mathcal{I}}_{\psi,\tau}(q_{\ell})$ for all $\ell\in\{1,\ldots,m\}$ . As $a_{j}$ occupies the position just before the leftmost point of $I_{j_{\ell}}$ in $w$ for all $\ell\in\{1,\ldots,m\}$ , $x$ cannot be the first symbol of $\tau$ . Thus there is some $x_{j^{\prime}}\in\mbox{Var}(\tau)$ with $j^{\prime}\neq i$ such that $j=c(x_{j^{\prime}})$ , which means that $a_{c(x_{j^{\prime}})}a_{c(x_{i})}a_{\xi_{i}}^{p_{i}}a_{c(x_{i})}$ occurs exactly $m$ times in $w$ . Now there cannot be exactly $m$ occurrences of the substring $x_{j^{\prime}}x$ in $\tau$ ; otherwise, the subpattern obtained from $\tau$ by deleting all occurrences of $x_{j^{\prime}}$ would be equivalent to $\tau$ , contradicting the succinctness of $\tau$ . Therefore there must be some $x_{j^{\prime\prime}}\in\mbox{Var}(\tau)$ (possibly equal to $x$ ) with $j^{\prime\prime}\neq j^{\prime}$ such that $x_{j^{\prime\prime}}x$ is a substring of $\tau$ , and so $a_{c(x_{j^{\prime\prime}})}a_{c(x_{i})}a_{\xi_{i}}^{p_{i}}a_{c(x_{i})}$ must be a subword of $w$ . However, by the choice of $c$ – in particular, Condition 2, $c(x_{j^{\prime}})\neq c(x_{j^{\prime\prime}})$ and thus $a_{c(x_{j^{\prime}})}a_{c(x_{i})}a_{\xi_{i}}^{p_{i}}a_{c(x_{i})}$ cannot occur exactly $m$ times in $w$ , a contradiction. An analogous proof shows that $a_{c(x_{i})}a_{\xi_{i}}^{p_{i}}a_{c(x_{i})}a_{j}$ cannot be a subword of $\psi(x)$ for any given $j\in\{1,\ldots,k\}$ . (Claim K.2)

By Claim K.2 and the choice of $i_{0}$ (which implies, in particular, that $I_{i_{0}}$ contains a cut-point), $\left|\overline{{\mathcal{I}}}_{\psi,\tau}\left(\bigcup_{\ell\leq i_{0}}I_{\ell}\right)\right|\geq i_{0}+1$ . By applying Claim K.2 successively to $w(I_{i_{0}}),w(I_{i_{0}+1}),\ldots,w(I_{n^{\prime}})$ , it follows that for $j=i_{0}+1,i_{0}+2,\ldots,n^{\prime}$ , $\left|\overline{{\mathcal{I}}}_{\psi,\tau}\left(\bigcup_{\ell\leq j}I_{\ell}\right)\right|\geq j+1$ and so $|\tau|\geq n^{\prime}+1$ , implying that $\tau\prec\pi$ , contrary to assumption.

Consequently, for each $j\in\{1,\ldots,n^{\prime}\}$ , there is some $j^{\prime}\in\{1,\ldots,|\tau|\}$ such that $I_{j}\subseteq{\mathcal{I}}_{\psi,\tau}(j^{\prime})$ ; by Claim K.2, one also has ${\mathcal{I}}_{\psi,\tau}(j^{\prime})\subseteq I_{j}$ . Thus, since $a_{c(x_{l_{i}})}a_{\ell_{l_{i}}}^{p_{l_{i}}}a_{c(x_{l_{i}})}$ occurs exactly $m$ times in $w$ for all $i\in\{1,\ldots,n^{\prime}\}$ and the subword of $w$ corresponding to the interval $I_{i^{\prime}}$ is different from that corresponding to $I_{i^{\prime\prime}}$ whenever $l_{i^{\prime}}\neq l_{i^{\prime\prime}}$ , one has (after normalising $\tau$ and $\pi$ ) $\pi\sqsubseteq\tau$ . As $|\tau|\leq|\pi|$ , it follows that $L(\tau)=L(\pi)$ , as required.

Remark K.3

The notion of the adjacency graph of a (constant-free) pattern was introduced in the study of pattern avoidance [26, Chapter 3]. We do not know whether the lower bound on $|\Sigma|$ in Theorem 11 is tight. The minimum number of colours needed to satisfy Conditions 1 and 2 in the proof of Theorem 11 might be smaller than $4m^{2}+1$ ; if so, this would give a reduction in the minimum alphabet size needed for the theorem to hold. In fact, the upper bound on $k$ in the proof of Theorem 11 would still hold if the second condition on $c$ is weakened as follows: if there are distinct $j_{1},j_{2}\in\{1,\ldots,n\}$ such that (i) $(x_{i}^{L},x_{j_{1}}^{R})\in E(\mbox{AG}(\pi))$ and $(x_{i}^{L},x_{j_{2}}^{R})\in E(\mbox{AG}(\pi))$ , (resp. $(x_{j_{1}}^{L},x_{i}^{R})\in E(\mbox{AG}(\pi))$ and $(x_{j_{2}}^{L},x_{i}^{R})\in E(\mbox{AG}(\pi))$ ), then $x_{i}^{L}$ (resp. $x_{i}^{R}$ ) is adjacent to at least two vertices that are assigned different colours.

L Proof of Proposition 12

Proof. We first note that over a unary alphabet $\Sigma=\{0\}$ , any pattern of the shape $0^{k}x_{1}^{m}\ldots x_{n}^{m}$ , where $k\geq 0$ and $n\geq 1$ , is equivalent to $0^{k}x^{m}$ . Given any patterns $\pi$ and $\pi^{\prime}$ of the shape $0^{k}x^{m}$ or $0^{k}$ , define $\pi\prec\pi^{\prime}$ iff

$\pi^{\prime}$ is a constant pattern and $\pi$ contains at least one variable, or 2. 2.

both $\pi$ and $\pi^{\prime}$ are non-constant patterns and $|\pi(\varepsilon)|<|\pi^{\prime}(\varepsilon)|$ .

For any constant pattern $\pi$ , a teaching set for $\pi$ w.r.t. $(\mbox{QR$ \Pi $}^{1}_{\infty,m},\prec)$ is $\{(\pi,+)\}$ : $\pi$ is preferred to all non-constant patterns while any constant pattern different from $\pi$ cannot be consistent with $(\pi,+)$ . For any pattern $\tau\mathrel{\mathop{\mathchar 58\relax}}=0^{k}x^{m}$ , where $k\geq 0$ , a teaching set for $\tau$ w.r.t. $(\mbox{QR$ \Pi $}^{1}_{\infty,m},\prec)$ is $\{(0^{k},+),(0^{k+m},+)\}$ : no constant pattern can be consistent with this sample; furthermore, since the constant part of any pattern consistent with this sample has length at most $|\tau|$ , it follows from Condition 2 above that $\tau$ is preferred to all $\tau^{\prime}$ such that $\tau^{\prime}$ is consistent with the sample and $L(\tau^{\prime})\neq L(\tau)$ .

To see that $\mbox{PBTD}(\mbox{QR$ \Pi $}^{z}_{\infty,m})\geq 2$ for all $z\geq 1$ , one may apply [17, Theorem 34]; according to this theorem, $\mbox{PBTD}(\mbox{QR$ \Pi $}^{1}_{\infty,m})>1$ because $\mbox{QR$ \Pi $}^{1}_{\infty,m}$ contains all constant patterns as well as infinitely many patterns that generate infinite languages.

M Proof of Lemma 13

Proof. Assertion (i). Assume, by way of contradiction, that there is a least $j_{0}$ such that $h(x_{j_{0}})$ does not satisfy the claim. It will be shown by induction that for every variable $x$ of $\pi$ that does not lie to the left of $x_{j_{0}}^{n_{j_{0}}}$ , $h(x)$ ends with [math]; since $w$ ends with $1$ , this would contradict the fact that $h(\pi)=w$ .

By the choice of $x_{j_{0}}$ , $h(x_{j_{0}})$ has one of the following shapes: (1) $0^{p}$ for some $p\in\{1,\ldots,\ell\}$ , (2) $0^{p^{\prime}}1\ldots 10^{p^{\prime\prime}}1$ for some $p^{\prime},p^{\prime\prime}\in\{1,\ldots,\ell\}$ with $p^{\prime\prime}>p^{\prime}$ , or (3) $0^{p^{\prime\prime\prime}}1\ldots 10^{p^{\prime\prime\prime\prime}}$ for some $p^{\prime\prime\prime},p^{\prime\prime\prime\prime}\in\{1,\ldots,\ell\}$ . If $h(x_{j_{0}})$ has the shape given in (2), then, since $x_{j_{0}}$ occurs at least twice in $\pi$ , $w$ must contain a subword of the shape $0^{p^{\prime\prime}}10^{p^{\prime}}1$ for some $p^{\prime},p^{\prime\prime}\in\{1,\ldots,\ell\}$ with $p^{\prime}<p^{\prime\prime}$ , which is impossible (as seen from the shape of $w$ in Equation (2)). Hence (1) or (3) holds, so the induction statement (i.e. that for every variable $x$ of $\pi$ that does not lie to the left of $x_{j_{0}}^{n_{j_{0}}}$ , $h(x)$ ends with [math]) holds for $x=x_{j_{0}}$ .

Now consider any variable $x$ of $\pi$ that lies to the right of $x_{j_{0}}^{n_{j_{0}}}$ . By the induction hypothesis, it may be assumed that for every variable $x^{\prime}$ of $\pi$ lying to the right of $x_{j_{0}}^{n_{j_{0}}}$ and to the left of $x$ , $h(x^{\prime})$ ends with [math]. If $h(x)$ starts with $1$ , then, since $x$ is repeated at least once in $\pi$ and every occurrence of $1$ in $w$ is preceded by [math], $h(x)$ must end with [math]. Suppose $h(x)$ starts with [math] and ends with $1$ . If $x^{\prime}$ is the variable immediately preceding $x$ in $\pi$ , then by the induction hypothesis, $h(x^{\prime})$ is of the shape $\alpha 0$ for some $\alpha\in\{0,1\}^{*}$ ; thus, since $x$ occurs at least twice in $\pi$ , if $\pi^{\prime}$ denotes the suffix of $\pi$ starting at the first occurrence of $x$ , then $h(\pi^{\prime})$ is of the shape $0^{p_{0}}10^{p_{1}}1\beta$ for some $p_{0},p_{1}\in\{1,\ldots,\ell\}$ with $p_{1}>p_{0}$ and some $\beta\in\{0,1\}^{*}$ . As $p_{1}>p_{0}$ , $h(x)$ cannot be equal to $0^{p_{0}}1$ , and therefore $h(x)$ must be of the shape $0^{p_{0}}1\ldots 0^{p_{2}}1$ for some $p_{2}\in\{1,\ldots,\ell\}$ with $p_{2}>p_{0}$ . But $w$ does not contain any subword of the shape $0^{p_{2}}10^{p_{0}}1$ with $p_{2}\in\{1,\ldots,\ell\}$ and $p_{0}<p_{2}$ . The latter contradiction implies that if $h(x)$ starts with [math], then it must also end with [math]. This completes the induction step and establishes the claim.

Assertion (ii). It suffices to show that for all $j\in\{0,\ldots,k\}$ , there is some $j^{\prime}\in\{1,\ldots,\ell\}$ such that ${\mathcal{I}}_{h,\pi}(J_{j})\subseteq I_{j^{\prime}}$ . By Assertion (i), there are $j^{\prime\prime}\in\{1,\ldots,\ell\}$ and $i^{\prime\prime}\in\{1,\ldots,i_{j^{\prime\prime}}\}$ such that $h(x_{j}^{n_{j}})=(0^{j^{\prime\prime}}1)^{i^{\prime\prime}}$ . Furthermore, if $j\geq 1$ , then $h(x_{j-1}^{n_{j-1}})$ ends with $1$ . One observes from Equation (2) that any occurrence of $0^{j^{\prime\prime}}1$ in $w$ that starts after an occurrence of $1$ or is a prefix of $w$ must belong to the interval $I_{j^{\prime\prime}}$ . Consequently, ${\mathcal{I}}_{h,\pi}(J_{j})\subseteq I_{j^{\prime\prime}}$ , as was to be shown.

N Proof of Theorem 14

Proof. If $m=1$ , then $\mbox{NC}\Pi^{z}_{\infty,m}$ contains only the pattern $x_{1}$ (up to equivalence) and thus $\mbox{TD}(\mbox{NC}\Pi^{z}_{\infty,1})=\mbox{PBTD}(\mbox{NC}\Pi^{z}_{\infty,1})=0$ . Suppose $m\geq 2$ .

Assertion (i). Suppose $\Sigma=\{0\}$ . We identify every pattern language $L(\pi)$ such that $\pi=x_{0}^{n_{0}}\ldots x_{k}^{n_{k}}$ with its Parikh image $\{\vec{v}\cdot\vec{x}\mathrel{\mathop{\mathchar 58\relax}}\vec{x}\in\mathbb{N}_{0}^{k+1}\}$ , where $\vec{v}=(n_{0},\ldots,n_{k})$ . Thus teaching $\mbox{NC}\Pi^{z}_{\infty,m}$ is equivalent to teaching the class ${\mathcal{C}}_{m}\mathrel{\mathop{\mathchar 58\relax}}=\{\{\vec{v}\cdot\vec{x}\mathrel{\mathop{\mathchar 58\relax}}x\in\mathbb{N}_{0}^{k}\}\mathrel{\mathop{\mathchar 58\relax}}\vec{v}\in\{1,\ldots,m\}^{k}\wedge k\in\mathbb{N}\}$ . Since the PBTD is a lower bound for the TD, it suffices to show that $\mbox{TD}(L,{\mathcal{C}}_{m})=O(m)$ for all $L\in{\mathcal{C}}_{m}$ and $\mbox{PBTD}({\mathcal{C}}_{m})=\Omega(m)$ . Let $L=\{\vec{v}\cdot\vec{x}\mathrel{\mathop{\mathchar 58\relax}}\vec{x}\in\mathbb{N}_{0}^{k+1}\}$ , where $\vec{v}=(n_{0},\ldots,n_{k})\in\{1,\ldots,m\}^{k+1}$ ; without loss of generality, it may be assumed that for all distinct $i$ and $j$ , $n_{i}$ does not divide $n_{j}$ (otherwise, if $n_{i}\mid n_{j}$ , then the linear set $L^{\prime}$ obtained from $L$ by deleting the entry $n_{j}$ from $\vec{v}$ in the definition of $L$ would be equal to $L$ ). It is shown that $L$ can be taught w.r.t. ${\mathcal{C}}_{m}$ using at most $m$ examples. Let $T$ be the sample consisting of all pairs $(p,\ell_{p})$ such that $p\leq m$ and $\ell_{p}=+$ if $p\in L$ and $\ell_{p}=-$ if $p\notin L$ (that is, $T$ consists of all examples for $L$ in the domain $\{0,1,2,\ldots,m\}$ ). Consider any $H\in{\mathcal{C}}_{m}$ that is consistent with $T$ . Since $\{n_{0},\ldots,n_{k}\}\subseteq L$ , the linearity of $H$ (resp. $L$ ) implies that $L\subseteq H$ . Furthermore, pick $\{n^{\prime}_{0},\ldots,n^{\prime}_{k^{\prime}}\}\subseteq\{1,\ldots,m\}$ so that $H$ is equal to $\{\vec{w}\cdot\vec{x}\mathrel{\mathop{\mathchar 58\relax}}x\in\mathbb{N}_{0}^{k^{\prime}+1}\}$ for $w=(n^{\prime}_{0},\ldots,n^{\prime}_{k^{\prime}})$ . The consistency of $H$ with $T$ implies that $\{n^{\prime}_{0},\ldots,n^{\prime}_{k^{\prime}}\}\subseteq\{n_{0},\ldots,n_{k}\}$ and hence $H\subseteq L$ . Therefore $H=L$ and so $T$ is indeed a teaching set for $L$ w.r.t. ${\mathcal{C}}_{m}$ .

Now it is shown that $\mbox{PBTD}({\mathcal{C}}_{m})=\Omega(m)$ . We reuse the construction in the proof of [18, Lemma 29]. Assume that $m\geq 6$ , and set $m^{\prime}=\left\lfloor\displaystyle\frac{m}{3}\right\rfloor$ . Let ${\mathcal{F}}$ be the class $\{{\left\langle\{m^{\prime}\}\cup\{p_{i}\mathrel{\mathop{\mathchar 58\relax}}1\leq i\leq m^{\prime}-1\}\right\rangle}\mathrel{\mathop{\mathchar 58\relax}}(\forall i\in\{1,\ldots,m^{\prime}-1\})[p_{i}\in\{m^{\prime}+i,2m^{\prime}+i\}]\}$ . Note that ${\mathcal{F}}\subseteq{\mathcal{C}}_{m}$ . Furthermore, every member of ${\mathcal{F}}$ is of the shape $\{0,m^{\prime}\}\cup\{p_{i}\mathrel{\mathop{\mathchar 58\relax}}1\leq i\leq m^{\prime}-1\}\cup\{x\mathrel{\mathop{\mathchar 58\relax}}x\geq 2m^{\prime}\}$ , where $p_{i}\in\{m^{\prime}+i,2m^{\prime}+i\}$ for all $i\in\{1,\ldots,m^{\prime}-1\}$ . Thus the TD of every member of ${\mathcal{F}}$ is at least $m^{\prime}-1$ , and therefore $\mbox{PBTD}({\mathcal{C}}_{m})\geq\mbox{PBTD}({\mathcal{F}})\geq m^{\prime}-1$ . This establishes that $\mbox{TD}({\mathcal{C}}_{m})=\Theta(m)$ and $\mbox{PBTD}({\mathcal{C}}_{m})=\Theta(m)$ .

Assertion (ii). Suppose $\{0,1\}\subseteq\Sigma$ . We first show that $\mbox{PBTD}(\mbox{NC}\Pi^{z}_{\infty,m})=1$ . Let $\prec$ be the preference relation on $\mbox{NC}\Pi^{z}_{\infty,m}$ defined according to the following hierarchy, in order of decreasing priority. Suppose $\pi$ and $\tau$ are non-cross patterns in canonical form belonging to $\mbox{NC}\Pi^{z}_{\infty,m}$ . (Here “prefer $\pi$ to $\tau$ ” means $\tau\prec\pi$ .)

Rule 1:

With highest priority: prefer $\pi$ to $\tau$ if $L(\pi)\neq L(x_{0})$ and $L(\tau)=L(x_{0})$ .

Rule 2:

With second highest priority: suppose both $\pi$ and $\tau$ contain at least two distinct variables; prefer $\pi$ to $\tau$ if $\pi$ has fewer variables than $\tau$ .

Rule 3:

With third highest priority: prefer $\pi$ to $\tau$ if $L(\pi)\subset L(\tau)$ .

Suppose $\pi=x_{0}^{n_{0}}\ldots x_{k}^{n_{k}}$ , where $n_{0},\ldots,n_{k}\in\mathbb{N}$ . If there is some $i$ with $n_{i}=1$ , then $\pi$ has the teaching set $\{(0,+)\}$ w.r.t. $\mbox{NC}\Pi^{z}_{\infty,m}$ . Suppose now that $n_{i}\geq 2$ for all $i$ . Let $T=\{(w_{1},+)\}$ , where

[TABLE]

Let $\tau\mathrel{\mathop{\mathchar 58\relax}}=y_{0}^{m_{0}}\ldots y_{\ell}^{m_{\ell}}$ denote any pattern in $\mbox{NC}\Pi^{z}_{\infty,m}$ that is consistent with $T$ and $\tau\not\prec\pi$ . By Rule 1, $m_{i}\geq 2$ for all $i\in\{0,\ldots,\ell\}$ , that is, $L(\tau)\neq L(x_{0})$ . By Lemma 13, the consistency of $\tau$ with $(w_{1},+)$ implies that $\tau$ is equivalent to $x_{0}$ or every variable of $\tau$ occurs at least twice and for each $j\in\{0,\ldots,k\}$ , there are nonnegative integers $s_{j,0},\ldots,s_{j,l_{j}}$ and $i_{j,0},i_{j,1},\ldots,i_{j,l_{j}}\in\{0,\ldots,\ell\}$ with $i_{j,h}<i_{j^{\prime},h^{\prime}}$ whenever $j<j^{\prime}$ or $j=j^{\prime}\wedge h<h^{\prime}$ such that $\sum_{h=0}^{l_{j}}s_{j,h}m_{i_{j,h}}=n_{j}$ . In particular, since $L(\tau)\neq L(x_{0})$ , $\tau$ contains at least $k+1$ variables. By Rule 2, $\tau$ must contain exactly $k+1$ variables. It follows that $\tau$ is equivalent to $x_{0}^{n^{\prime}_{0}}x_{1}^{n^{\prime}_{1}}\ldots x_{k}^{n^{\prime}_{k}}$ , where, for each $i\in\{0,\ldots,k\}$ , $n^{\prime}_{i}\mid n_{i}$ . If there were a least $i^{\prime}\in\{0,\ldots,k\}$ such that $n^{\prime}_{i}<n_{i}$ (that is, $n^{\prime}_{i}$ properly divides $n_{i}$ ), then $L(\pi)\subset L(\tau)$ and so $\tau\prec\pi$ by Rule 3, contradicting the choice of $\tau$ . Thus $n^{\prime}_{i}=n_{i}$ for all $i\in\{0,\ldots,k\}$ and therefore $L(\tau)=L(\pi)$ , as required.

Next, it will be shown that $\mbox{TD}(\mbox{NC}\Pi^{z}_{\infty,m})$ is at most $2$ plus the number of prime powers (including primes) less than $m$ ; this is equal to $2+\sum_{i=1}^{\lfloor\log(m-1)\rfloor}\varrho\left((m\right.$ $\left.-1)^{\frac{1}{i}}\right)$ , where $\varrho(x)$ denotes the number of primes less than or equal to $x$ . As observed earlier, the pattern $x_{0}$ can be taught with the single example $(0,+)$ . Suppose $\pi=x_{0}^{n_{0}}\ldots x_{k}^{n_{k}}$ , where $n_{i}\geq 2$ for all $i\in\{0,\ldots,k\}$ . We build a teaching set $T$ consisting of the following examples; $\eta\mathrel{\mathop{\mathchar 58\relax}}=y_{0}^{m_{0}}\ldots y_{\ell}^{m_{\ell}}$ will denote any non-cross pattern (in canonical form) in $\mbox{NC}\Pi^{z}_{\infty,m}$ that is consistent with $T$ . First, put $(v_{1},+)$ into $T$ , where

[TABLE]

According to Lemma 13, the consistency of $\eta$ with $(v_{1},+)$ implies that for each $j\in\{0,\ldots,k\}$ , there are nonnegative integers $s_{j,0},\ldots,s_{j,l_{j}}$ and $i_{j,0},\ldots,i_{j,l_{j}}\in\{0,\ldots,\ell\}$ such that $i_{j,h}<i_{j^{\prime},h^{\prime}}$ iff $j<j^{\prime}$ or $j=j^{\prime}\wedge h<h^{\prime}$ , and $\sum_{r=0}^{l_{j}}s_{j,r}m_{i_{j,r}}=n_{j}$ . Second, define

[TABLE]

and put $(v_{2},-)$ into $T$ . Note that by Lemma 13, $v_{2}$ is indeed a negative example for $\pi$ because any pattern $\pi^{\prime}$ with $v_{2}\in L(\pi^{\prime})$ is equivalent to $x_{0}$ or it contains at least $k+2$ variables that occur at least twice. Furthermore, Lemma 13 also implies that $\eta$ is not equivalent to $x_{0}$ and that $\eta$ contains at most $k+1$ variables. Since the consistency of $\eta$ with $(v_{1},+)$ implies that $\eta$ contains at least $k+1$ distinct variables, it follows that $\eta$ contains exactly $k+1$ variables, each of which occurs at least twice. That is to say, $\eta$ is of the shape $x_{0}^{n^{\prime}_{0}}x_{1}^{n^{\prime}_{1}}\ldots x_{k}^{n^{\prime}_{k}}$ , where, for each $i\in\{0,\ldots,k\}$ , $n^{\prime}_{i}\mid n_{i}$ . It remains to ensure that $n^{\prime}_{i}$ does not properly divide $n_{i}$ for any $i\in\{0,\ldots,k\}$ .

Let $\{q_{0}^{r_{0}},\ldots,q_{\ell^{\prime}}^{r_{\ell^{\prime}}}\}$ be the set of all prime powers that are maximal proper prime power factors of the $n_{i}$ ’s; in other words, for every $j\in\{0,\ldots,\ell^{\prime}\}$ , there is some $j_{0}\in\{0,\ldots,k\}$ with $q_{j}^{r_{j}}\mid n_{j_{0}}$ and $q_{j}^{r_{j}}\neq n_{j_{0}}$ but $q_{j}^{r_{j}+1}\nmid n_{j_{0}}$ . For each $j\in\{0,\ldots,\ell^{\prime}\}$ , let $d_{j}$ be the number of $i$ ’s between [math] and $k$ (inclusive) such that $q_{j}^{r_{j}}$ does not divide $n_{i}$ , and set $e_{j}=q_{j}^{r_{j}-1}\cdot\prod_{p~{}\mbox{is prime}\wedge q_{j}\neq p\leq m}p^{\left\lfloor\frac{\log(m)}{\log(p)}\right\rfloor}$ . Now define

[TABLE]

for every $j\in\{0,\ldots,\ell^{\prime}\}$ , and put $(t_{j},-)$ into $T$ .

We first show that $t_{j}\notin L(\pi)$ for every $j\in\{0,\ldots,\ell^{\prime}\}$ . This will be achieved by means of a proof by contradiction; assuming that $t_{j}\in L(\pi)$ , one can construct a one-one mapping $F$ from $\{1,\ldots,d_{j}+1\}$ to $\{i\in\{0,\ldots,k\}\mathrel{\mathop{\mathchar 58\relax}}q_{j}^{r_{j}}\nmid n_{i}\}$ as follows. Given any $i\in\{1,\ldots,d_{j}+1\}$ , it follows from Lemma 13 that there are nonnegative integers $s_{i,0},\ldots,s_{i,l_{i}}$ and $u_{i,0},\ldots,u_{i,l_{i}}\in\{0,\ldots,k\}$ such that $u_{i,g}<u_{i^{\prime},g^{\prime}}$ iff $i<i^{\prime}$ or $i=i^{\prime}$ and $g<g^{\prime}$ , and $\sum_{k=0}^{l_{i}}s_{i,k}n_{u_{i,k}}=e_{j}$ . Note that since $q_{j}^{r_{j}}\nmid e_{j}$ , there must exist a least $h_{i}$ such that $q_{j}^{r_{j}}\nmid n_{u_{i,h_{i}}}$ . Define $F(i)=u_{i,h_{i}}$ . Then $range(F)\subseteq\{i\in\{0,\ldots,k\}\mathrel{\mathop{\mathchar 58\relax}}q_{j}^{r_{j}}\nmid n_{i}\}$ ; furthermore, $i<i^{\prime}\Rightarrow u_{i,h_{i}}<u_{i^{\prime},h_{i^{\prime}}}\Leftrightarrow F(i)<F(i^{\prime})$ . Thus $F$ is indeed a one-one mapping, so that

[TABLE]

a contradiction.

To complete the proof, it will be shown that if there were a least $i^{\prime\prime}\in\{0,\ldots,k\}$ such that $n^{\prime}_{i^{\prime\prime}}$ properly divides $n_{i^{\prime\prime}}$ (as noted above, $\eta$ is of the shape $x_{0}^{n^{\prime}_{0}}x_{1}^{n^{\prime}_{1}}\ldots x_{k}^{n^{\prime}_{k}}$ , where $n^{\prime}_{i}\mid n_{i}$ for all $i\in\{0,\ldots,k\}$ ), then there would be a least $j^{\prime}\in\{0,\ldots,\ell^{\prime}\}$ such that $t_{j^{\prime}}\in L(\eta)$ . Suppose such an $i^{\prime\prime}$ did exist. Then there must be a least $j^{\prime\prime}\in\{0,\ldots,\ell^{\prime}\}$ for which $q_{j^{\prime\prime}}^{r_{j^{\prime\prime}}}\mid n_{i^{\prime\prime}}$ and $n^{\prime}_{i^{\prime\prime}}\mid n_{i^{\prime\prime}}q_{j^{\prime\prime}}^{-1}$ . Hence the number of $i$ ’s between [math] and $k$ (inclusive) such that $q_{j^{\prime\prime}}^{r_{j^{\prime\prime}}}\nmid n^{\prime}_{i}$ is at least $1$ more than the number of $i$ ’s between [math] and $k$ (inclusive) such that $q_{j^{\prime\prime}}^{r_{j^{\prime\prime}}}\nmid n_{i}$ , and the number of $j_{1}$ ’s between [math] and $k$ inclusive such that $n^{\prime}_{j_{1}}\mid e_{j^{\prime\prime}}$ is at least $d_{j^{\prime\prime}}+1$ . Consequently, $t_{j^{\prime\prime}}\in L(\eta)$ , which is the desired contradiction. In conclusion, $n^{\prime}_{i}=n_{i}$ for all $i\in\{0,\ldots,k\}$ and thus $\eta$ is equivalent to $\pi$ ; this establishes that $T$ is a teaching set for $\pi$ w.r.t. $\mbox{NC}\Pi^{z}_{\infty,m}$ .

To prove that $\mbox{TD}(\mbox{NC}\Pi^{z}_{\infty,m})\geq\max(\{\omega(n)\mathrel{\mathop{\mathchar 58\relax}}n\leq m\})$ , pick any $n\leq m$ such that $\omega(n)\geq\omega(n^{\prime})$ for all $n^{\prime}\leq m$ . Let $q_{1},\ldots,q_{\omega(n)}$ be all the prime factors of $n$ , and consider the non-cross pattern $\theta\mathrel{\mathop{\mathchar 58\relax}}=x_{1}^{\prod_{i=1}^{\omega(n)}p_{i}}$ . For each $i\in\omega(n)$ , set $\theta_{i}\mathrel{\mathop{\mathchar 58\relax}}=x_{1}^{\prod_{j\neq i}p_{j}}$ . We note that $\theta\in\mbox{NC}\Pi^{z}_{\infty,m}$ and for all $i\in\{1,\ldots,\omega(n)\}$ , $\theta_{i}\in\mbox{NC}\Pi^{z}_{\infty,m}$ . Furthermore, whenever $i\neq j$ , $L(\theta_{i})\cap L(\theta_{j})\subseteq L(\theta)$ . It follows that $\mbox{TD}(\theta,\mbox{NC}\Pi^{z}_{\infty,m})\geq\omega(n)$ .

O Example 15

Suppose $\{0,1\}\subseteq\Sigma$ . Let $\pi=x_{1}^{4}x_{2}^{8}x_{3}^{9}$ . There are $3$ maximal proper prime power factors of $4,8$ and $9$ , namely, $2,4$ and $3$ , and so by the proof of Theorem 14, the TD of $\pi$ w.r.t. $\mbox{NC}\Pi^{|\Sigma|}_{\infty,9}$ is at most $2+3=5$ . However, one can build a teaching set $T$ of size $4$ for $\pi$ as follows. As in the proof of Theorem 14(ii), put $(v_{1},+)$ and $(v_{2},-)$ into $T$ , where $v_{1}\mathrel{\mathop{\mathchar 58\relax}}=(01)^{4}(001)^{8}(0001)^{9}$ and $v_{2}\mathrel{\mathop{\mathchar 58\relax}}=(01)^{9!}(001)^{9!}(0001)^{9!}(00001)^{9!}$ . Arguing as in the proof of Theorem 14(ii), any pattern $\tau\in\mbox{NC}\Pi^{|\Sigma|}_{\infty,9}$ that is consistent with both $(v_{1},+)$ and $(v_{2},-)$ must be of the shape $x_{1}^{k_{1}}x_{2}^{k_{2}}x_{3}^{k_{3}}$ , where $k_{1}\mid 4,k_{2}\mid 8$ and $k_{3}\mid 9$ . Thus at this stage, it suffices to distinguish $\pi$ from the three patterns $x_{1}^{2}x_{2}^{8}x_{3}^{9}$ , $x_{1}^{4}x_{2}^{4}x_{3}^{9}$ and $x_{1}^{4}x_{2}^{8}x_{3}^{3}$ . Put $(v_{3},-)$ into $T$ , where

[TABLE]

To see that $v_{3}\notin L(\pi)$ , assume, by way of contradiction, that some morphism $\psi\mathrel{\mathop{\mathchar 58\relax}}X^{*}\mapsto\Sigma^{*}$ satisfies $\psi(\pi)=v_{3}$ . Since $v_{3}$ is not a $4$ -th, $8$ -th or $9$ -th power, at least two variables of $\pi$ are not mapped to the empty word by $\psi$ .

First, suppose $\psi(x_{1})\neq\varepsilon$ . Then $\psi(x_{1}^{4})$ must be equal to either $0^{8}$ or $0^{4}$ . If $\psi(x_{1}^{4})=0^{8}$ , then, since $(10^{6})^{8}0^{3}$ is not a $9$ -th power, $\psi(x_{2}^{8})=(10^{6})^{8}$ and therefore $\psi(x_{3}^{9})=0^{3}$ , which is impossible. The argument for the case $\psi(x_{1}^{4})=0^{4}$ is similar. Second, suppose $\psi(x_{1})=\varepsilon$ . Then $\psi(x_{2}^{8})\neq\varepsilon$ and $\psi(x_{3}^{9})\neq\varepsilon$ . Hence $\psi(x_{2}^{8})=0^{8}$ and so $\psi(x_{3}^{9})=(10^{6})^{8}0^{3}$ , which is impossible.

On the other hand, $v_{3}\in L(x_{1}^{2}x_{2}^{8}x_{3}^{9})\cap L(x_{1}^{4}x_{2}^{8}x_{3}^{3})$ . Thus it only remains to distinguish $\pi$ from $x_{1}^{4}x_{2}^{4}x_{3}^{9}$ , and this may be done with a single negative example, say $v_{4}\mathrel{\mathop{\mathchar 58\relax}}=(01)^{4}(001)^{4}(0001)^{9}$ .

P Remark on Theorem 14

Establishing the exact TD of any given pattern in $\mbox{NC}\Pi^{z}_{\infty,m}$ (for any fixed finite $z\geq 2$ and $m\geq 2$ ) seems to be quite difficult in general. We highlight a potential difficulty faced when one tries to apply a natural method to determine a lower bound on the TD of such a pattern. Suppose $\pi\mathrel{\mathop{\mathchar 58\relax}}=x_{1}^{n_{1}}\ldots x_{k}^{n_{k}}$ , where $n_{1},\ldots,n_{k}\geq 2$ and $k\geq 1$ . For each maximal proper prime power factor $q^{r}$ of $n_{i}$ , let $\pi_{i,q}$ be the pattern derived from $\pi$ by replacing $x_{i}^{n_{i}}$ with $x_{i}^{q}$ , and let $P$ be the finite class of all patterns so obtained. For the sake of convenience, assume the variables of patterns in $P$ are renamed so that for all $P_{i},P_{j}\in P$ with $i\neq j$ , $\mbox{Var}(P_{i})\cap\mbox{Var}(P_{j})=\emptyset$ and $\mbox{Var}(P_{i})\cap\{x_{1},\ldots,x_{k}\}=\emptyset$ . For each partition $\mathcal{P}$ of $P$ and every member $\{P_{1},\ldots,P_{d}\}$ of $\mathcal{P}$ , let $y_{1},\ldots,y_{\ell^{\prime}}$ be all the variables occurring in $P_{1},\ldots,P_{d}$ . Then $\pi$ is distinguishable from $\{P_{1},\ldots,P_{d}\}$ with a single negative example iff the sentence

[TABLE]

holds. As implied by the work of Karhumäki et al. [23], there is a word equation $E$ with variables in $\mbox{Var}(P_{1})\cup\mbox{Var}(\pi)\cup\{z_{1},\ldots,z_{\ell^{\prime\prime}}\}$ (for some additional variables $z_{1},\ldots,z_{\ell^{\prime\prime}}$ ) such that the inequation $P_{1}(y_{1},\ldots,y_{\ell^{\prime}})\neq\pi(x_{1},\ldots,x_{k})$ is equivalent to $(\exists z_{1},\ldots,z_{\ell^{\prime\prime}})E$ . Consequently, (16) is equivalent to a sentence whose prenex normal form has quantifier prefix $\exists\forall\exists$ (call this an $\exists\forall\exists$ -sentence; a $\forall\exists$ -sentence is defined analogously) over a conjunction of word equations. If a decidability procedure exists for all such $\exists\forall\exists$ -sentences, then one could decide whether or not $\pi$ is distinguishable from $\{P_{1},\ldots,P_{d}\}$ with a single example. More generally, one could find a largest number $f\leq|P|$ such that for all partitions $\mathcal{P}$ of $P$ of size $f^{\prime}<f$ (that is, $\mathcal{P}$ has exactly $f^{\prime}$ members), there is a member $\{P_{1},\ldots,P_{d}\}$ of $\mathcal{P}$ from which $\pi$ is not distinguishable with a single example. Then $f$ would be a lower bound on the teaching dimension of $\pi$ w.r.t. $\mbox{NC}\Pi^{z}_{\infty,m}$ . However, this method does not seem feasible because the set of all $\forall\exists$ -sentences over positive word equations (combinations of word equations using $\wedge$ or $\vee$ ) is already undecidable [15].

Q Proof of Theorem 16

Proof. We first compute $\mbox{PBTD}(\Pi^{\infty})$ . It will be assumed that every pattern $\pi$ in the present proof is succinct, i.e. $|\pi^{\prime}|\geq|\pi|$ for all $\pi^{\prime}$ such that $L(\pi^{\prime})=L(\pi)$ . Define a preference relation $\prec$ on $\Pi^{z}$ based on the following preference hierarchy, where $\pi$ and $\tau$ are any two given succinct patterns:

Rule 1:

With highest priority, prefer $\pi$ to $\tau$ (i.e. $\tau\prec\pi$ ) if $|\pi(\varepsilon)|>|\tau(\varepsilon)|$ .

Rule 2:

With second highest priority, prefer $\pi$ to $\tau$ (i.e. $\tau\prec\pi$ ) if $L(\pi)\subseteq L(\tau)$ .

Given any $\pi\in\Pi^{\infty}$ , one can construct a teaching set $T$ of size at most $2$ for $\pi$ w.r.t. $(\Pi^{z},\prec)$ as follows; $\tau$ will denote any pattern in $\Pi^{z}$ that is consistent with $T$ and $\tau\not\prec\pi$ . First, put $(\pi(\varepsilon),+)$ into $T$ . Since $\tau(\varepsilon)\sqsubseteq\pi(\varepsilon)$ , Rule 1 will ensure that $\tau(\varepsilon)=\pi(\varepsilon)$ , that is, $\pi$ and $\tau$ have identical constant parts. Second, suppose $\mbox{Var}(\pi)=\{x_{0},\ldots,x_{k-1}\}$ . Choose a set $\{a_{0},\ldots,a_{k-1}\}$ of $k$ distinct letters such that $\{a_{0},\ldots,a_{k-1}\}\cap\mbox{Const}(\pi)=\emptyset$ , and put $(\pi[x_{i}\rightarrow a_{i},0\leq i\leq k-1],+)$ into $T$ . By Theorem 17, the fact that $\pi[x_{i}\rightarrow a_{i},0\leq i\leq k-1]\in L(\tau)$ implies $L(\pi)\subseteq L(\tau)$ . By Rule 2, one has $L(\tau)=L(\pi)$ , as required.

To see that $\mbox{PBTD}(\Pi^{\infty})\geq 2$ , one may apply [17, Theorem 34]; according to this theorem, $\mbox{PBTD}(\Pi^{\infty})>1$ because $\Pi^{\infty}$ contains all constant patterns as well as infinitely many patterns that generate infinite languages.

Next, it is shown that $\mbox{PBTD}(\Pi^{1}_{\infty,m})=\Theta(m)$ . Suppose $\Sigma=\{0\}$ . It follows from Theorem 14 and the monotonicity of the PBTD [17, Lemma 6] that $\mbox{PBTD}(\Pi^{1}_{\infty,m})\geq\mbox{PBTD}(\mbox{NC}\Pi^{1}_{\infty,m})=\Theta(m)$ . For the upper bound, we observe that every pattern in $\Pi^{1}_{\infty,m}$ is equivalent to a pattern of the shape $0^{k}x_{1}^{n_{1}}\ldots,x_{\ell}^{n_{\ell}}$ , where $k+\ell\geq 1$ and $1\leq n_{1}<\ldots<n_{\ell}\leq m$ (this follows from the fact that over unary alphabets, equivalence of two patterns is preserved under permutations of the patterns’ symbols and that any two terms of the shape $x_{i}^{n}x_{j}^{n}$ can be combined into a single term $x_{i}^{n}$ ). Define the preference relation $\prec$ on $\Pi^{1}_{\infty,m}$ as follows: for any $\pi,\pi^{\prime}\in\Pi^{1}_{\infty,m}$ , $\pi\prec\pi^{\prime}$ iff

•

$|\pi^{\prime}(\varepsilon)|>|\pi(\varepsilon)|$ , or

•

$\pi^{\prime}(\varepsilon)=\pi(\varepsilon)$ and $L(\pi^{\prime})\subset L(\pi)$ .

Suppose $\pi\in\Pi^{1}_{\infty,m}$ . If $\pi=0^{k}$ for some $k\geq 1$ , then $\pi$ can be taught w.r.t. $(\Pi^{1}_{\infty,m},\prec)$ using the single positive example $(0^{k},+)$ since all patterns containing $0^{k}$ must have a constant part of length at least $k=|\pi|$ and $\pi$ is preferred to all patterns with a constant part of length less than $k$ . Suppose $\pi=0^{k}x_{1}^{n_{1}}\ldots x_{\ell}^{n_{\ell}}$ for some $\ell\geq 1$ such that $n_{i}<n_{j}$ whenever $i<j$ . A teaching set for $\pi$ is $T\mathrel{\mathop{\mathchar 58\relax}}=\{(0^{k},+)\}\cup\{(0^{k+n_{i}},+)\mathrel{\mathop{\mathchar 58\relax}}i\in\{1,\ldots,\ell\}\}$ . Let $\tau$ denote any pattern in $\Pi^{1}_{\infty,m}$ that is consistent with $T$ . The positive example $(0^{k},+)$ ensures that $\tau(\varepsilon)=\pi(\varepsilon)$ . Furthermore, since $0^{k+n_{i}}\in L(\tau)$ for all $i\in\{1,\ldots,\ell\}$ , it follows that $L(\tau)\subseteq L(\pi)$ , and so by the definition of $\prec$ , $L(\tau)=L(\pi)$ .

R Proof of Lemma 18

Proof. The “if” direction of the lemma follows from Condition (ii), Theorem 17 and the fact that $L(\pi)\subseteq L(\tau)$ (which is in turn implied by $\tau\in\pi\shuffle Y^{*}$ and $Y\cap\mbox{Var}(\tau)=\emptyset$ ). We prove the “only if” direction of the lemma.

Condition (i):

Assume, by way of contradiction, that $Y^{\prime}\delta$ were a prefix of $\tau$ . Fix some $\omega\in\Sigma\setminus\mbox{Const}(\pi)$ and $y\in\mbox{Var}(Y^{\prime})$ . Set $w=\tau[y\rightarrow\omega]$ . Then $w\in L(\tau)$ by construction; on the other hand, since $\tau$ starts with $\delta$ but $w$ starts with $\omega\neq\delta$ , $w\notin L(\pi)$ . The proofs that $\delta Y^{\prime}$ is not a suffix of $\tau$ and $\delta Y^{\prime}\delta^{\prime}$ is not a substring of $\tau$ are similar.

Condition (ii):

Note that Condition (i) implies $\pi$ is similar to $\tau$ . If $L(\tau)\subseteq L(\pi)$ , then (a) $|\Sigma|=\infty$ , (b) $\pi$ is similar to $\tau$ and (c) the second part of Theorem 17 together imply Condition (ii).

Condition (iii):

Let $h\mathrel{\mathop{\mathchar 58\relax}}(X\cup\Sigma)^{*}\mapsto(X\cup Y\cup\Sigma)^{*}$ be any constant-preserving morphism such that $h(\pi)=\tau$ . Let $p_{1},p_{2},\ldots,p_{n}$ be all the positions of $\pi$ that are occupied by variables, where $p_{1}<p_{2}<\ldots<p_{n}$ , and for all $j\in\{1,\ldots,n\}$ , let $x_{i_{j}}$ denote the variable at the $p_{j}^{th}$ position of $\pi$ . (For example, if $\pi=x_{1}0x_{1}0x_{2}x_{1}x_{3}x_{3}$ , then $i_{1}=i_{2}=1,i_{3}=2,i_{4}=1$ and $i_{5}=i_{6}=3$ .)

Suppose there is a least $j\in\{1,\ldots,n\}$ such that $h(x_{i_{j}})$ is not of the shape $Y_{1}x_{i_{j}}Y_{2}$ , where $Y_{1},Y_{2}\in Y^{*}$ . Note that $\mbox{Const}(h(x))=\emptyset$ for all $x\in\mbox{Var}(\pi)$ , for otherwise $h(\pi)$ would have more occurrences of constants than $\tau$ (by Condition (ii), $\tau$ is similar to $\pi$ ). Since $x_{i_{1}},x_{i_{2}},\ldots,x_{i_{n}}$ occur in $\tau$ in the same order as their appearance in $\pi$ , there exists some $j_{1}\in\{1,\ldots,n\}$ such that $j_{1}\geq j$ and $h(x_{i_{j_{1}}})\in Y^{*}$ . Now let $\pi^{\prime}$ be the pattern obtained from $\pi$ by deleting all occurrences of $x_{i_{j_{1}}}$ . Let $h^{\prime}\mathrel{\mathop{\mathchar 58\relax}}(X\cup\Sigma)^{*}\mapsto(X\cup\Sigma)^{*}$ be a constant-preserving morphism such that $h^{\prime}(x)=(h(x)){\big{|}}_{\Sigma\cup\mbox{Var}(\pi)}$ for all $x\in\mbox{Var}(\pi)$ . Then one has

[TABLE]

Consequently, by Theorem 17, $L(\pi)\subseteq L(\pi^{\prime})$ . By construction, $L(\pi^{\prime})\subseteq L(\pi)$ and so $L(\pi)=L(\pi^{\prime})$ . But $\pi^{\prime}$ is a pattern shorter than $\pi$ that generates the same language as $\pi$ , contrary to the hypothesis that $\pi$ is succinct.

S Proof of Lemma 19

Proof. We split the analysis into two cases.

Case 1:

There are $Y^{\prime}\in Y^{+}$ and $\delta,\delta^{\prime}\in\Sigma$ such that at least one of the following holds: (i) $Y^{\prime}\delta$ is a prefix of $\tau$ , (ii) $\delta Y^{\prime}$ is a suffix of $\tau$ or (iii) $\delta Y^{\prime}\delta^{\prime}$ is a substring of $\tau$ . Suppose (i) holds. Pick some $y\in\mbox{Var}(Y^{\prime})$ and let $\tau^{\prime}$ be the restriction of $\tau$ to $\Sigma\cup\mbox{Var}(\pi)\cup\{y\}$ . We show $L(\tau^{\prime})\supset L(\pi)$ . By construction, $L(\tau^{\prime})\supseteq L(\pi)$ . Fix some $\omega\in\Sigma\setminus\mbox{Const}(\pi)$ , and set $w=\tau^{\prime}[y\rightarrow\omega]$ . Then $w\in L(\tau^{\prime})$ . Further, since $\pi$ starts with $\delta$ but $w$ starts with $\omega\neq\delta$ , one has $w\notin L(\pi)$ . A similar proof applies if (ii) or (iii) holds.

Case 2:

Not Case 1. Let $x_{1}$ (resp. $x_{n}$ ) be the leftmost (resp. rightmost) variable of $\pi$ . For each $x\in\mbox{Var}(\pi)$ , let $Y^{x}_{\ell}$ (resp. $Y^{x}_{r}$ ) be the longest substring $Z$ in $Y^{*}$ such that every occurrence of $x$ in $\tau$ is immediately preceded (resp. succeeded) by $Z$ . For each occurrence of $x\in\mbox{Var}(\pi)$ , identify the unique $y\in Y$ such that $y$ immediately precedes the corresponding occurrence of $Y^{x}_{\ell}x$ , and put $y$ into $S^{x}$ (if no such $y$ exists, then nothing needs to be done). Similarly, for each occurrence of $x\in\mbox{Var}(\pi)$ , identify the unique $z\in Y$ such that $z$ immediately succeeds the corresponding occurrence of $xY^{x}_{r}$ , and put $z$ into $S^{x}$ (again, nothing needs to be done if no such $z$ exists). Further, if the last (resp. first) symbol occurring in $\tau$ is some $y\in Y$ , put $y$ into $S^{x_{n}}$ (resp. $S^{x_{1}}$ ). Lastly, for every substring of $\tau$ of the shape $xY^{\prime}\delta$ (resp. $\delta Y^{\prime}x$ ), where $\delta\in\Sigma,Y^{\prime}\in Y^{+}$ and $x\in\mbox{Var}(\pi)$ , put the last (resp. first) symbol of $Y^{\prime}$ into $S^{x}$ .

Let $\tau^{\prime}$ be the restriction of $\tau$ to $\Sigma\cup\mbox{Var}(\pi)\cup\bigcup_{x\in\mbox{Var}(\pi)}S^{x}$ . Note that $\tau^{\prime}\in\Pi^{\infty}_{4mk+|\pi|+2,m}$ and $\tau^{\prime}=\tau{\big{|}}_{\Sigma\cup\mbox{Var}(\pi)\cup S}$ for some finite $S\subset Y$ . Suppose there is a constant-preserving morphism $g\mathrel{\mathop{\mathchar 58\relax}}(X\cup\Sigma)^{*}\mapsto(X\cup\Sigma)^{*}$ such that $g(\pi)=\tau^{\prime}$ . We show that this implies the existence of a constant-preserving morphism $g^{\prime}\mathrel{\mathop{\mathchar 58\relax}}(X\cup\Sigma)^{*}\mapsto(X\cup\Sigma)^{*}$ such that $g^{\prime}(\pi)=\tau$ . It will then follow that whenever $L(\pi)\subset L(\tau)$ , one has $L(\pi)\subset L(\tau^{\prime})$ , as required. By Lemma 18, every occurrence of any $y\in Y$ in $\tau^{\prime}$ is contained in a substring of $\tau^{\prime}$ of the shape $xY^{\prime}$ or $Y^{\prime}x$ for some $Y^{\prime}\in Y^{+}$ , and for every $x\in\mbox{Var}(\pi)$ , there are $Z^{x}_{\ell},Z^{x}_{r}\in Y^{*}$ for which $I_{g,\pi}$ maps the position $p_{x}$ of the $t^{th}$ occurrence of $x$ in $\pi$ (for any $t\leq\left|\pi{\big{|}}_{x}\right|$ ) to an interval $J_{p_{x}}$ of positions of $\tau^{\prime}$ corresponding to an occurrence of $Z^{x}_{\ell}xZ^{x}_{r}$ in $\tau^{\prime}$ such that the position of the $t^{th}$ occurrence of $x$ in $\tau^{\prime}$ belongs to $J_{p_{x}}$ . Suppose $\tau=\rho_{1}x_{1}\cdots x_{n}\rho_{n}$ and $\tau^{\prime}=\rho^{\prime}_{1}x_{1}\cdots x_{n}\rho^{\prime}_{n}$ , where $\rho_{1},\rho^{\prime}_{1},\rho_{n},\rho^{\prime}_{n}\in Y^{*}$ .

Our first step is to show $Y^{x_{1}}_{\ell}=\rho_{1}$ and $Y^{x_{n}}_{r}=\rho_{n}$ . So assume, by way of contradiction, that at least one of the following holds: (i) $Y^{x_{1}}_{\ell}\neq\rho_{1}$ or (ii) $Y^{x_{n}}_{r}\neq\rho_{n}$ . Suppose (i) holds. Since $Y^{x_{1}}_{\ell}\neq\rho_{1}$ , there is a unique $y\in Y$ immediately preceding the first occurrence of $Y^{x_{1}}_{\ell}x_{1}$ in $\tau$ . Note that $Z^{x_{1}}_{\ell}=\rho^{\prime}_{1}=\rho_{1}{\big{|}}_{\bigcup_{x\in\mbox{Var}(\pi)}S^{x}}$ . Furthermore, there is another substring of $\tau$ of the shape $sY^{x_{1}}_{\ell}x_{1}$ , where $s\in\left(Y\setminus\{y\}\right)\cup\mbox{Var}(\pi)\cup\Sigma$ . If $s\in Y\setminus\{y\}$ , then $s\in S^{x_{1}}$ and so $Z^{x_{1}}_{\ell}\neq\rho_{1}{\big{|}}_{\bigcup_{x\in\mbox{Var}(\pi)}S^{x}}=\rho^{\prime}_{1}$ , a contradiction. If $s\in\mbox{Var}(\pi)\cup\Sigma$ , then one has

[TABLE]

which again shows $Z^{x_{1}}_{\ell}\neq\rho^{\prime}_{1}$ , a contradiction. A similar proof shows that (ii) contradicts the definition of $Y^{x_{n}}_{r}$ .

The next step is to show that for every substring of $\tau$ of the shape $\delta Y^{\prime}x$ (resp. $xY^{\prime}\delta$ ), where $\delta\in\Sigma,Y^{\prime}\in Y^{*}$ and $x\in\mbox{Var}(\pi)$ , one has $Y^{x}_{\ell}=Y^{\prime}$ (resp. $Y^{x}_{r}=Y^{\prime}$ ). The proof is similar to that in the preceding paragraph. Suppose there is a substring of $\tau$ of the shape $\delta Y^{\prime}x$ and $Y^{x}_{\ell}\neq Y^{\prime}$ . There is a unique $y\in Y$ at the $\left(|Y^{\prime}|-|Y^{x}_{\ell}|\right)^{th}$ position of $Y^{\prime}$ , and so by the definition of $S^{x}$ one has $y\in S^{x}$ . Then, as argued in the preceding paragraph, one has $Z^{x}_{\ell}\neq Y^{\prime}{\big{|}}_{\bigcup_{x\in\mbox{Var}(\pi)}S^{x}}$ , and so such a $g$ as described earlier cannot exist. The proof for substrings of $\tau$ of the shape $xY^{\prime}\delta$ is similar.

Thus one may safely assume that (i) $Y^{x_{1}}_{\ell}=\rho_{1}$ , (ii) $Y^{x_{n}}_{r}=\rho_{n}$ , and (iii) for all substrings of $\tau$ of the shape $\delta Y^{\prime}x$ (resp. $xY^{\prime}\delta$ ), where $\delta\in\Sigma,Y^{\prime}\in Y^{*}$ and $x\in\mbox{Var}(\pi)$ , we have $Y^{x}_{\ell}=Y^{\prime}$ (resp. $Y^{x}_{r}=Y^{\prime}$ ).

We next observe that for any $x_{i}\in\mbox{Var}(\pi)$ and $y\in\bigcup_{x\in\mbox{Var}(\pi)}S^{x}$ , $\#(y)[Z^{x_{i}}_{\ell}]\leq\#(y)[Y^{x_{i}}_{\ell}]$ . To see this, suppose first that there is an occurrence of $Y^{x_{i}}_{\ell}x_{i}$ that is not immediately preceded by any $y\in Y$ . Then $Z^{x_{i}}_{\ell}=Y^{x_{i}}_{\ell}{\big{|}}_{\bigcup_{x\in\mbox{Var}(\pi)}S^{x}}$ and thus for all $y\in\bigcup_{x\in\mbox{Var}(\pi)}S^{x}$ , $\#(y)[Z^{x_{i}}_{\ell}]\leq\#(y)[Y^{x_{i}}_{\ell}]$ . Second, suppose that every occurrence of $Y^{x_{i}}_{\ell}$ is immediately preceded by some $y\in Y$ . Thus, by the choice of $Y^{x_{i}}_{\ell}$ , there must exist distinct $y^{\prime},y^{\prime\prime}\in Y$ such that $y^{\prime}Y^{x_{i}}_{\ell}x_{i}$ and $y^{\prime\prime}Y^{x_{i}}_{\ell}x_{i}$ are substrings of $\tau$ . By the definition of $S^{x_{i}}$ , $y^{\prime},y^{\prime\prime}\in S^{x_{i}}$ . Hence both $y^{\prime}Y^{x_{i}}_{\ell}{\big{|}}_{\bigcup_{x\in\mbox{Var}(\pi)}S^{x}}x_{i}$ and $y^{\prime\prime}Y^{x_{i}}_{\ell}{\big{|}}_{\bigcup_{x\in\mbox{Var}(\pi)}S^{x}}x_{i}$ are substrings of $\tau^{\prime}$ , and therefore $Z^{x_{i}}_{\ell}$ is a suffix of $Y^{x_{i}}_{\ell}{\big{|}}_{\bigcup_{x\in\mbox{Var}(\pi)}S^{x}}$ . Consequently, $\#(y)[Z^{x_{i}}_{\ell}]\leq\#(y)[Y^{x_{i}}_{\ell}]$ for all $y\in\bigcup_{x\in\mbox{Var}(\pi)}S^{x}$ , as required. Similarly, for any $x_{i}\in\mbox{Var}(\pi)$ and $y\in\bigcup_{x\in\mbox{Var}(\pi)}S^{x}$ , $\#(y)[Z^{x_{i}}_{r}]\leq\#(y)[Y^{x_{i}}_{r}]$ .

For every $x_{i}\in\mbox{Var}(\pi)$ , let $\alpha_{i}$ be the longest suffix of $Y^{x_{i}}_{\ell}$ such that $\alpha_{i}{\big{|}}_{\bigcup_{x\in\mbox{Var}(\pi)}S^{x}}$ $=Z^{x_{i}}_{\ell}$ and let $\beta_{i}$ be the shortest prefix of $Y^{x_{i}}_{r}$ such that $\beta_{i}{\big{|}}_{\bigcup_{x\in\mbox{Var}(\pi)}S^{x}}=Z^{x_{i}}_{r}$ (by the remarks in the preceding paragraph, such $\alpha_{i}$ and $\beta_{i}$ exist). Set $g^{\prime}(x_{i})=\alpha_{i}x_{i}\beta_{i}$ . For example, suppose $Y^{x_{i}}_{\ell}=y_{1}y_{2}^{2}y_{3}y_{1}y_{3}y_{4}y_{1}$ , $Y^{x_{i}}_{r}=y_{1}y_{2}y_{3}y_{1}y_{3}y_{2}y_{4}$ , $Z^{x_{i}}_{\ell}=y_{1}^{2}$ , $Z^{x_{i}}_{r}=y_{1}y_{2}y_{1}$ and $\bigcup_{x\in\mbox{Var}(\pi)}S^{x}=\{y_{1},y_{2}\}$ . Then $\alpha_{i}=y_{3}y_{1}y_{3}y_{4}y_{1}$ and $\beta_{i}=y_{1}y_{2}y_{3}y_{1}$ .

It remains to verify that $g^{\prime}(\pi)=\tau$ . By the present case assumption, every occurrence of any substring $Z\in Y^{*}$ of $\tau$ is contained in a substring $\theta$ of $\tau$ satisfying at least one of the following: (a) $\theta=Zx_{1}$ and $\theta$ is a prefix of $\tau$ ; (b) $\theta=x_{n}Z$ and $\theta$ is a suffix of $\tau$ ; (c) $\theta=x_{i}Zx_{j}$ , for some $x_{i},x_{j}\in\mbox{Var}(\pi)$ ; (d) $\theta=\delta Zx_{i}$ for some $\delta\in\Sigma$ and $x_{i}\in\mbox{Var}(\pi)$ ; (e) $\theta=x_{i}Z\delta$ for some $\delta\in\Sigma$ and $x_{i}\in\mbox{Var}(\pi)$ . Thus, since $g^{\prime}(x_{i})=\alpha_{i}x_{i}\beta_{i}$ for all $x_{i}\in\mbox{Var}(\pi)$ , it suffices to show: (a) $\alpha_{1}=\rho_{1}$ ; (b) $\beta_{n}=\rho_{n}$ ; (c) if $x_{i}Zx_{j}$ is a substring of $\tau$ for some $Z\in Y^{*}$ , then $\beta_{i}\alpha_{j}=Z$ ; (d) if $\delta Zx_{i}$ is a substring of $\tau$ for some $Z\in Y^{*}$ and $\delta\in\Sigma$ , then $\alpha_{i}=Z$ ; (e) if $x_{i}Z\delta$ is a substring of $\tau$ for some $Z\in Y^{*}$ and $\delta\in\Sigma$ , then $\beta_{i}=Z$ .

Assertion (a):

Note that since $Y^{x_{1}}_{\ell}=\rho_{1}$ and $Z^{x_{1}}_{\ell}=\rho^{\prime}_{1}=\rho_{1}{\big{|}}_{\bigcup_{x\in\mbox{Var}(\pi)}S^{x}}$ , we have $Z^{x_{1}}_{\ell}=Y^{x_{1}}_{\ell}{\big{|}}_{\bigcup_{x\in\mbox{Var}(\pi)}S^{x}}$ . Consequently, $\alpha_{1}=Y^{x_{1}}_{\ell}=\rho_{1}$ .

Assertion (b):

An argument similar to that in the proof of Assertion (a) yields $Z^{x_{n}}_{r}=Y^{x_{n}}_{r}{\big{|}}_{\bigcup_{x\in\mbox{Var}(\pi)}S^{x}}$ . Furthermore, since, if $\rho_{n}\neq\varepsilon$ , $S^{x_{n}}$ must contain the last variable occurring in $\rho_{n}$ ( $=Y^{x_{n}}_{r}$ ), the shortest prefix of $Y^{x_{n}}_{r}$ whose restriction to $\bigcup_{x\in\mbox{Var}(\pi)}S^{x}$ equals $Z^{x_{n}}_{r}$ is $Y^{x_{n}}_{r}$ . Therefore $\beta_{n}=Y^{x_{n}}_{r}=\rho_{n}$ .

Assertion (c):

Suppose that for some $x_{i},x_{j}\in\mbox{Var}(\pi)$ and $Z\in Y^{*}$ , $x_{i}Zx_{j}$ is a substring of $\tau$ . One must show $\beta_{i}\alpha_{j}=Z$ .

First, suppose $Z^{x_{i}}_{r}=\varepsilon$ . Then $Z^{x_{j}}_{\ell}=Z{\big{|}}_{x\in\mbox{Var}(\pi)S^{x}}$ . Since $Z^{x_{j}}_{\ell}$ is a suffix of $Y^{x_{j}}_{\ell}{\big{|}}_{x\in\mbox{Var}(\pi)S^{x}}$ , it follows that $Z{\big{|}}_{x\in\mbox{Var}(\pi)S^{x}}$ is a suffix of $Y^{x_{j}}_{\ell}{\big{|}}_{x\in\mbox{Var}(\pi)S^{x}}$ . As $Y^{x_{j}}_{\ell}$ is a suffix of $Z$ , one also has that $Y^{x_{j}}_{\ell}{\big{|}}_{x\in\mbox{Var}(\pi)S^{x}}$ is a suffix of $Z{\big{|}}_{x\in\mbox{Var}(\pi)S^{x}}$ , and therefore $Y^{x_{j}}_{\ell}{\big{|}}_{x\in\mbox{Var}(\pi)S^{x}}=Z{\big{|}}_{x\in\mbox{Var}(\pi)S^{x}}$ . If $Y^{x_{j}}_{\ell}\neq Z$ , then there is some $y\in Y$ immediately preceding $Y^{x_{j}}_{\ell}$ in $Z$ such that $y\in S^{x_{j}}$ , implying $Z{\big{|}}_{x\in\mbox{Var}(\pi)S^{x}}\neq Y^{x_{j}}_{\ell}{\big{|}}_{x\in\mbox{Var}(\pi)S^{x}}$ . Hence $Y^{x_{j}}_{\ell}=Z$ . Since $\alpha_{j}$ is the longest suffix of $Y^{x_{j}}_{\ell}$ whose restriction to $\bigcup_{x\in\mbox{Var}(\pi)}S^{x}$ equals $Z^{x_{j}}_{\ell}$ and

[TABLE]

we have $\alpha_{i}=Y^{x_{j}}_{\ell}$ . Furthermore, since $\beta_{i}$ is the shortest prefix of $Y^{x_{i}}_{r}$ whose restriction to $\bigcup_{x\in\mbox{Var}(\pi)}S^{x}$ equals $Z^{x_{i}}_{r}$ , one has $\beta_{i}=\varepsilon$ , and so $\beta_{i}\alpha_{j}=Y^{x_{j}}_{\ell}=Z$ .

Second, suppose $Z^{x_{i}}_{r}\neq\varepsilon$ . Then $Y^{x_{i}}_{r}\neq\varepsilon$ . Recall that $\alpha_{j}$ is the longest suffix of $Y^{x_{j}}_{\ell}$ whose restriction to $\bigcup_{x\in\mbox{Var}(\pi)}S^{x}$ equals $Z^{x_{j}}_{\ell}$ , and that $Y^{x_{j}}_{\ell}$ is a suffix of $Z$ . Let $Z=\gamma\alpha_{j}$ , where $\gamma\in Y^{*}$ . Since $Z^{x_{i}}_{r}\neq\varepsilon$ , $\gamma\neq\varepsilon$ . In particular, note that $\gamma[|\gamma|]\in\bigcup_{x\in\mbox{Var}(\pi)}S^{x}$ due to the following reasons: if $\alpha_{j}=Y^{x_{j}}_{\ell}$ , then $\gamma[|\gamma|]\in S^{x_{j}}$ by the definition of $S^{x_{j}}$ ; if $\alpha_{j}$ were a proper suffix of $Y^{x_{j}}_{\ell}$ and $\gamma[|\gamma|]\notin\bigcup_{x\in\mbox{Var}(\pi)}S^{x}$ , then $\gamma[|\gamma|]\alpha_{j}$ would be a suffix of $Y^{x_{j}}_{\ell}$ longer than $\alpha_{j}$ whose restriction to $\bigcup_{x\in\mbox{Var}(\pi)}S^{x}$ equals $Z^{x_{j}}_{\ell}$ . Thus $\gamma[|\gamma|]$ is equal to the last symbol of $Z^{x_{i}}_{r}$ ; denote this symbol by $y$ . One has $\#(y)[\alpha_{j}]=\#(y)[Z^{x_{j}}_{\ell}]$ and thus $\#(y)[\gamma]=\#(y)[Z]-\#(y)[\alpha_{j}]=\#(y)[Z^{x_{i}}_{r}]+\#(y)[Z^{x_{j}}_{\ell}]-\#(y)[\alpha_{j}]=\#(y)[Z^{x_{i}}_{r}]$ . It follows that $\gamma$ is the shortest prefix of $Y^{x_{i}}_{r}$ whose restriction to $\bigcup_{x\in\mbox{Var}(\pi)}S^{x}$ equals $Z^{x_{i}}_{r}$ , which means that $\gamma=\beta_{i}$ .

Assertion (d):

Suppose that for some $\delta\in\Sigma$ , $Z\in Y^{*}$ and $x_{i}\in\mbox{Var}(\pi)$ , $\delta Zx_{i}$ is a substring of $\tau$ . One must show $\alpha_{i}=Z$ . As was proven earlier, $Y^{x_{i}}_{\ell}=Z$ . Hence $Z^{x_{i}}_{\ell}=Z{\big{|}}_{\bigcup_{x\in\mbox{Var}(\pi)}S^{x}}=Y^{x_{i}}_{\ell}{\big{|}}_{\bigcup_{x\in\mbox{Var}(\pi)}S^{x}}$ . Since $\alpha_{i}$ is the longest suffix of $Y^{x_{i}}_{\ell}$ with $\alpha_{i}{\big{|}}_{\bigcup_{x\in\mbox{Var}(\pi)}S^{x}}=Z^{x_{i}}_{\ell}$ , one has $\alpha_{i}=Y^{x_{i}}_{\ell}=Z$ .

Assertion (e):

Suppose that for some $\delta\in\Sigma$ , $Z\in Y^{*}$ and $x_{i}\in\mbox{Var}(\pi)$ , $x_{i}Z\delta$ is a substring of $\tau$ . One must show $\beta_{i}=Z$ . First, $Y^{x_{i}}_{r}=Z$ was proven earlier. As in the proof of Assertion (d), $Z^{x_{i}}_{r}=Z{\big{|}}_{\bigcup_{x\in\mbox{Var}(\pi)}S^{x}}=Y^{x_{i}}_{r}{\big{|}}_{\bigcup_{x\in\mbox{Var}(\pi)}S^{x}}$ . Furthermore, since $S^{x_{i}}$ contains the last symbol of $Z$ , $Y^{x_{i}}_{r}$ is the shortest prefix of $Y^{x_{i}}_{r}$ ( $=Z$ ) whose restriction to $\bigcup_{x\in\mbox{Var}(\pi)}S^{x}$ equals $Z^{x_{i}}_{r}$ . Hence $\beta_{i}=Y^{x_{i}}_{r}=Z$ .

T Proof of Theorem 20

Proof. Assertion (i). Suppose $\Sigma=\{0\}$ . Then every $\pi\in\Pi^{1}_{\infty,m}$ is equivalent to a pattern of the shape $0^{k}x_{1}^{p_{1}}\ldots x_{n}^{p_{n}}$ , where $k\geq 0$ and $0\leq p_{1}<\ldots<p_{n}\leq m$ . The constant part of $\pi$ may be taught using the sample $\{(0^{k},+)\}\cup\{(0^{k-i},-)\mathrel{\mathop{\mathchar 58\relax}}1\leq i\leq\min(\{k,m\})\}$ . Furthermore, for each $k$ , there are at most $\sum_{i=0}^{m}{m\choose i}=2^{m}$ many patterns $\pi^{\prime}$ of the shape $0^{k}x_{1}^{p^{\prime}_{1}}\ldots x_{\ell}^{p^{\prime}_{\ell}}$ , where $0\leq p^{\prime}_{1}<\ldots<p^{\prime}_{\ell}\leq m$ . For each such pattern $\pi^{\prime}$ with $L(\pi^{\prime})\neq L(\pi)$ , $\pi^{\prime}$ can be distinguished from $\pi$ using a word in the symmetric difference of $L(\pi)$ and $L(\pi^{\prime})$ . It follows that $\pi$ has a teaching set of size at most $2^{m}+m+1$ , as required.

Now suppose $|\Sigma|=\infty$ and $\Sigma\setminus\mbox{Const}(\pi)=\{a_{1},a_{2},a_{3},\ldots\}$ . Let $k$ be the number of distinct variables in $\pi$ . We build a teaching set $T$ for $\pi$ w.r.t. $\Pi^{\infty}_{\infty,m}$ . Let $\tau$ denote any pattern in $\Pi^{\infty}_{\infty,m}$ that is consistent with $T$ . Given $\pi=X_{1}c_{1}X_{2}c_{2}\ldots c_{n-1}X_{n}\in\Pi^{\infty}_{\infty,m}$ , where $X_{1},X_{2},\ldots,X_{n}\in X^{*}$ and $c_{1},c_{2},\ldots,c_{n-1}$ $\in\Sigma^{+}$ , put all $O(2^{|\pi(\varepsilon)|})$ elements of $\{(\pi(\varepsilon),+)\}\cup\{(v,-)\mathrel{\mathop{\mathchar 58\relax}}v\sqsubset\pi(\varepsilon)\}$ into $T$ ; these examples ensure that $\tau(\varepsilon)=\pi(\varepsilon)$ . Next, set $w=\pi[x_{i}\rightarrow a_{i}\mathrel{\mathop{\mathchar 58\relax}}x_{i}\in\mbox{Var}(\pi)]$ and put $(w,+)$ into $T$ . Then $w\in L(\tau)$ implies there is a substitution $g\mathrel{\mathop{\mathchar 58\relax}}X\mapsto\Sigma^{*}$ such that for some $S\subseteq\mbox{Var}(\tau)$ , $g(\tau{\big{|}}_{S})=w$ and $g(x)\neq\varepsilon$ for all $x\in\tau{\big{|}}_{S}$ . Fix such an $S$ . Let $g^{\prime}$ be a morphism such that $g^{\prime}(a_{i})=x_{i}$ for all $i$ ; one has $(g^{\prime}\circ g)(\tau{\big{|}}_{S})=\pi$ , and so $L(\pi)\subseteq L(\tau{\big{|}}_{S})\subseteq L(\tau)$ . There are at most $O((1+|\pi|)^{|\pi|})$ patterns $\tau^{\prime}$ (up to equivalence) such that for some substitution $h\mathrel{\mathop{\mathchar 58\relax}}X\mapsto\Sigma^{*}$ , $h(\tau^{\prime})=w$ and $h(x)\neq\varepsilon$ for all $x\in\mbox{Var}(\tau^{\prime})$ ; note that each such $\tau^{\prime}$ satisfies $L(\pi)\subseteq L(\tau^{\prime})$ . For each such $\tau^{\prime}$ with $L(\tau^{\prime})\supset L(\pi)$ , pick $w_{\tau^{\prime}}\in L(\tau^{\prime})\setminus L(\pi)$ and put $(w_{\tau^{\prime}},-)$ into $T$ . The latter negative examples ensure that $L(\pi)=L(\tau{\big{|}}_{S})$ . Moreover, since $\pi$ is succinct and $L(\pi)=L(\tau{\big{|}}_{S})$ , it follows from Lemma 18 that $\tau{\big{|}}_{S}$ is equal to $\pi$ up to a renaming of variables. Thus, up to a renaming of variables, $\tau\in\pi\shuffle Y^{*}$ for some infinite set $Y$ of variables with $Y\cap\mbox{Var}(\pi)=\emptyset$ . By Lemma 19, there exists some $\tau^{\prime}\in\Pi^{\infty}_{4mk+|\pi|+2,m}$ such that $\tau^{\prime}=\tau{\big{|}}_{S^{\prime}}$ for some finite $S^{\prime}\subseteq Y$ , and if $L(\pi)\subset L(\tau)$ , then $L(\pi)\subset L(\tau^{\prime})$ . For every $\tau^{\prime\prime}\in\left(\Pi^{\infty}_{4mk+|\pi|+2,m}\right)\cap\pi\shuffle Y^{*}$ with $\tau^{\prime\prime}(\varepsilon)=\pi(\varepsilon)$ and $L(\tau^{\prime\prime})\supset L(\pi)$ , pick some $w_{\tau^{\prime\prime}}\in L(\tau^{\prime\prime})\setminus L(\pi)$ and put $(w_{\tau^{\prime\prime}},-)$ into $T$ ; there are at most $O((D+1)^{D})$ many such $\tau^{\prime\prime}$ (up to equivalence), where $D\mathrel{\mathop{\mathchar 58\relax}}=(4mk+|\pi|+2)\cdot m$ . These negative examples ensure that $L(\tau)\not\supset L(\pi)$ . Therefore $L(\tau)=L(\pi)$ , which proves that $T$ is indeed a teaching set of size $O((D+1)^{D})$ for $\pi$ w.r.t. $\Pi^{\infty}_{\infty,m}$ , where $D\mathrel{\mathop{\mathchar 58\relax}}=(4mk+|\pi|+2)\cdot m$ .

Assertion (ii). The first part of this assertion follows quite directly from the proof of a result in [8]. As the latter reference is currently under review, we reproduce the proof here (with a few minor modifications).

In this proof, $\Pi^{z}_{k}$ denotes the class of all $k$ -variable patterns. Let $\pi$ be a given $(k-1)$ -variable pattern in which every variable occurs at most $m$ times, and suppose for the sake of a contradiction that $\pi$ is not simple block-regular but it has a finite teaching set $T$ w.r.t. $\Pi^{z}_{k}$ . Let

[TABLE]

where $Y_{1},Y_{n}\in X^{*}$ , $Y_{2},\ldots,Y_{n-1}\in X^{+}$ , $c_{1},\ldots,c_{n-1}\in\Sigma^{+}$ and $I_{1},\ldots,I_{n-1}$ are the closed intervals of positions of $\pi$ corresponding, respectively, to the particular occurrences of the constant blocks $c_{1},\ldots,c_{n-1}$ as marked in Equation (17). Fix some $s>\max(\{|\alpha|\mathrel{\mathop{\mathchar 58\relax}}\alpha\in T^{+}\cup T^{-}\cup\{\pi\}\})$ and pick a variable $y\in X\setminus\mbox{Var}(\pi)$ . We consider three cases.

Case 1:

There is a least $i\in\{1,\ldots,n\}$ such that $Y_{i}\neq\varepsilon$ and every variable in $Y_{i}$ occurs at least twice in $\pi$ . We will assume that $2\leq i\leq n-1$ , as the cases $i=1$ and $i=n$ can be handled very similarly. Fix some distinct $a,b\in\Sigma$ such that both $a$ and $b$ differ from the last symbol of $c_{i-1}$ as well as the first symbol of $c_{i}$ .444Such choices of $a$ and $b$ are possible because $|\Sigma|\geq 4$ . Suppose $Y_{i}$ starts at the $p^{th}$ position of $\pi$ . We consider two subcases.

Case 1.1:

For every variable $x$ occurring in $Y_{i}$ , $x$ occurs in some $Y_{i^{\prime}}$ with $i^{\prime}\neq i$ . Let $\pi^{\prime}$ be the pattern derived from $\pi$ by inserting $y^{s}$ between the $p^{th}$ and the $(p+1)^{st}$ positions of $\pi$ ; $\pi^{\prime}\in 1\Pi^{z}_{m}$ because no variable of $\pi$ occurs more than $m$ times. Note that $L(\pi)\subseteq L(\pi^{\prime})$ by construction, and so $\pi^{\prime}$ is consistent with $\{(v,+)\mathrel{\mathop{\mathchar 58\relax}}v\in T^{+}\}$ . Moreover, since $|w|>\max(\{|\alpha|\mathrel{\mathop{\mathchar 58\relax}}\alpha\in T^{-}\})$ for all $w\in L(\pi^{\prime})\setminus L(\pi)$ , $\pi^{\prime}$ is also consistent with $\{(v,-)\mathrel{\mathop{\mathchar 58\relax}}v\in T^{-}\}$ . Hence $\pi^{\prime}$ is consistent with $T$ . Furthermore, let $w=\pi^{\prime}[y\rightarrow a]$ . Decompose $w$ as

[TABLE]

where $J_{1},\ldots,J_{i},\ldots,J_{n-1}$ are the closed intervals of positions of $w$ corresponding, respectively, to the particular occurrences of the constant blocks $c_{1},\ldots,c_{i},\ldots,$ $c_{n-1}$ as marked in Equation (18). Assume, by way of contradiction, that there exists a substitution $h\mathrel{\mathop{\mathchar 58\relax}}X\mapsto\Sigma^{*}$ such that $h(\pi)=w$ . By the choice of $a$ , ${\mathcal{I}}_{h,\pi}(I_{i-1})$ cannot be an interval starting or ending between $J_{i-1}$ and $J_{i}$ . Furthermore, ${\mathcal{I}}_{h,\pi}(I_{i-1})$ cannot intersect any of the intervals $J_{1},\ldots,J_{i-2}$ because otherwise $\sum_{j=1}^{i-2}\left|{\mathcal{I}}_{h,\pi}(I_{j})\right|$ would be smaller than $\sum_{j=1}^{i-2}\left|J_{j}\right|$ , which is impossible. Similarly, ${\mathcal{I}}_{h,\pi}(I_{i-1})$ cannot intersect any of the intervals $J_{i},\ldots,J_{n-1}$ . Hence ${\mathcal{I}}_{h,\pi}(I_{i-1})=J_{i-1}$ . An analogous argument shows that ${\mathcal{I}}_{h,\pi}(I_{i})=J_{i}$ . It follows that for all $j\in\{1,\ldots,n-1\}$ , ${\mathcal{I}}_{h,\pi}(I_{j})=J_{j}$ . Thus there is a subsequence $Y^{\prime}_{i}$ of $Y_{i}$ such that $Y^{\prime}_{i}\neq\varepsilon$ and $h(Y^{\prime}_{i})=a^{s}$ . However, based on Equation (18) and the fact that every variable of $Y_{i}$ occurs in some $Y_{i^{\prime}}$ with $i^{\prime}\neq i$ , it can be concluded that $h(Y^{\prime}_{i})=\varepsilon$ , which contradicts $h(Y^{\prime}_{i})=a^{s}$ . Thus $w\in L(\pi^{\prime})\setminus L(\pi)$ , and so $T$ cannot be a teaching set for $\pi$ w.r.t. $\Pi^{z}_{k}$ .

Case 1.2:

$Y_{i}$ contains at least one variable that does not occur in any $Y_{j}$ with $j\neq i$ . Let $x_{j_{1}},\ldots,x_{j_{\ell}}$ be all the variables of $Y_{i}$ that do not occur outside $Y_{i}$ , and let $p_{1},p_{2},\ldots,p_{\ell^{\prime}}$ be all the positions of $\pi$ that are occupied by some $x_{j_{q}}$ with $q\in\{1,\ldots,\ell\}$ , where $p_{1}<p_{2}<\ldots<p_{\ell^{\prime}}$ . Let $\pi^{\prime}$ be the pattern derived from $\pi$ by simultaneously inserting $y^{2s-j+1}$ between the $(p_{j}-1)^{st}$ and the $p_{j}^{th}$ positions of $\pi$ for all $j\in\{1,\ldots,\ell^{\prime}\}$ . For example, if $\pi=x_{1}x_{2}ax_{2}x_{3}x_{3}bx_{4}$ and $i=2$ , then $\pi^{\prime}=x_{1}x_{2}ax_{2}y^{2s}x_{3}y^{2s-1}x_{3}bx_{4}$ . Note that $\pi^{\prime}\in 1\Pi^{z}_{m}$ . By construction, $L(\pi)\subseteq L(\pi^{\prime})$ and $\pi^{\prime}$ is consistent with $T$ . Now set $\beta=\pi^{\prime}[y\rightarrow a,x_{j_{q}}\rightarrow b,1\leq q\leq\ell]$ . We argue that $\beta\notin L(\pi)$ . One has that

[TABLE]

By arguing as in Case 1.1, the choice of $a,b$ implies that if $Y^{\prime}_{i}$ is the restriction of $Y_{i}$ to $\{x_{j_{1}},\ldots,x_{j_{\ell}}\}$ , then there is a substitution $h\mathrel{\mathop{\mathchar 58\relax}}X\mapsto\Sigma^{*}$ such that $\gamma=h(Y^{\prime}_{i})$ , where $\gamma$ is as defined in Equation (19). Note that $|Y^{\prime}_{i}|=\ell^{\prime}$ .

That $\gamma\neq h\left(Y^{\prime}_{i}\right)$ will follow from Lemma A.2 and the following claim.

Claim T.1

If $\gamma=h(Y^{\prime}_{i})$ , then $\gamma$ has at least $\left|Y^{\prime}_{i}\right|$ cuts relative to $(h,Y^{\prime}_{i})$ .

Proof of Claim T.1. We first decompose $\gamma$ as follows:

[TABLE]

Claim T.1 will follow from the fact that for all $j\in\{1,\ldots,\ell^{\prime}\}$ , $I_{j}$ contains at least one cut-point. First, observe that since $a^{2s}$ occurs exactly once as a substring of $\gamma$ and every variable of $Y^{\prime}_{i}$ occurs at least twice in $Y^{\prime}_{i}$ , there cannot exist any $q\in\{1,\ldots,\ell\}$ such that $a^{2s}$ is a substring of $h(x_{j_{q}})$ . Thus $I_{1}$ must contain at least one cut-point of $\gamma$ . Second, for all $j\in\{2,\ldots,\ell^{\prime}\}$ , $ba^{2s-j+1}b$ occurs exactly once as a substring of $\gamma$ . Arguing as before, we conclude that $I_{j}$ contains at least one cut-point of $\gamma$ . (Claim T.1)

It follows from Lemma A.2 and Claim T.1 that $\gamma\neq h\left(Y^{\prime}_{i}\right)$ and therefore $\beta\notin L(\pi^{\prime})$ , as desired.

Case 2:

$\pi$ contains a substring of the shape $ab$ , where $a,b\in\Sigma$ ( $a$ and $b$ are not necessarily distinct). Since $|\Sigma|\geq 4$ , one can fix some $c\in\Sigma$ with $c\notin\{a,b\}$ . Let $j_{3}$ be a position of $\pi$ such that $\pi[j_{3}]\pi[j_{3}+1]=ab$ . If $L(\pi)$ had a finite teaching set $T$ w.r.t. $\Pi^{z}$ , then one can argue as in Case 1 that there is a positive $s$ so large that if $\pi^{\prime}$ is obtained from $\pi$ by inserting $y^{s}$ between the $j_{3}^{th}$ and $(j_{3}+1)^{st}$ positions of $\pi$ , then $\pi^{\prime}$ would be consistent with $T$ . On the other hand, let $\gamma$ be the string derived from $\pi^{\prime}$ by substituting $c$ for $y$ and $\varepsilon$ for every other variable; note that the number of times the substring $ab$ occurs in $\gamma$ is strictly less than the number of times that $ab$ occurs in $\pi$ , which implies $\gamma\notin L(\pi)$ and so $L(\pi^{\prime})\neq L(\pi)$ . Therefore $\mbox{TD}(\pi,\Pi^{z})=\infty$ .

Case 3:

$\pi$ does not start or end with variables. Suppose $\pi$ starts with the constant symbol $a$ . The proof that $L(\pi)$ has no finite teaching set w.r.t. $\Pi^{z}$ is very similar to that in Case 2; the only difference here is that one chooses some $b\in\Sigma\setminus\{a\}$ and considers $\pi^{\prime}=y^{s}\pi$ for some variable $y\notin\mbox{Var}(\pi)$ and a sufficiently large $s$ . In this case, $b^{s}\pi(\varepsilon)\in L(\pi^{\prime})\setminus L(\pi)$ , and therefore $L(\pi^{\prime})\neq L(\pi)$ . An analogous argument holds if $\pi$ ends with a constant symbol.

Next, we prove the second part of the assertion. Suppose $\pi$ contains a variable $x$ that occurs $\ell$ times for some $\ell>m$ . We build a teaching set $T$ for $\pi$ w.r.t. $1\Pi^{\infty}_{m}$ . First, put the sample $\{(\pi(\varepsilon),+)\}\cup\{(w^{\prime},-)\mathrel{\mathop{\mathchar 58\relax}}w^{\prime}\sqsubset\pi(\varepsilon)\}$ into $T$ ; this sample uniquely identifies the constant part of $\pi$ (i.e. $\pi(\varepsilon)$ ). Second, pick some $a\in\Sigma\setminus\mbox{Const}(\pi)$ and put $(\pi[x\rightarrow a],+)$ into $T$ ; this additional example reduces the version space to all patterns in $1\Pi^{\infty}_{m}\cap\Pi^{\infty}_{\infty,\ell}$ . Since, by Assertion (i), every pattern in $\Pi^{\infty}_{\infty,\ell}$ has a finite TD, this implies that $\mbox{TD}(\pi,1\Pi^{\infty}_{m})\leq\mbox{TD}(\pi,\Pi^{\infty}_{\infty,\ell})<\infty$ , as required. Furthermore, if $\pi$ is simple block-regular, then it follows from [7, Proposition 4] that $\mbox{TD}(\pi,1\Pi^{\infty}_{m})\leq\mbox{TD}(\pi,\Pi^{\infty})<\infty$ .

U Proof of Theorem 21

Proof. Suppose $\Sigma=\{0,1\}$ . Let $\{x_{i,j}\mathrel{\mathop{\mathchar 58\relax}}i,j\in{\mathbb{N}}_{0}\}$ and $\{y_{i,j}\mathrel{\mathop{\mathchar 58\relax}}i,j\in{\mathbb{N}}_{0}\}$ be two disjoint infinite sets of variables. It suffices to show that $\pi$ does not possess a finite tell-tale w.r.t. $\Pi^{2}_{\infty,4,cf}$ (i.e. a finite set $S\subseteq L(\pi)$ such that for all $\tau\in\Pi^{2}_{\infty,4,cf}$ , one has $S\subseteq L(\tau)\subseteq L(\pi)\Rightarrow L(\tau)=L(\pi)$ ). Following the proof in [31], assume, by way of contradiction, that $\pi$ has a finite tell-tale $\{w_{1},\ldots,w_{n}\}$ for some $n\geq 1$ . Without loss of generality, assume that $w_{i}\neq\varepsilon$ for all $i\in\{1,\ldots,n\}$ . For each $i\in\{1,\ldots,n\}$ , there is a substitution $\sigma_{i}\mathrel{\mathop{\mathchar 58\relax}}X\mapsto\Sigma^{*}$ witnessing $\sigma_{i}(\pi)=w_{i}$ . Set $\tilde{\sigma}_{i}(\pi)\mathrel{\mathop{\mathchar 58\relax}}=\sigma_{i}(x_{1})\sigma_{i}(x_{2})\sigma_{i}(x_{3})$ . We define, for each $i\in\{1,\ldots,n\}$ , patterns $\gamma_{i,1},\gamma_{i,2}$ and $\gamma_{i,3}$ according to the following case distinction.

Case 1:

There is some $\delta\in\mbox{Const}(w_{i})$ such that the last occurrence of $\delta$ in $\tilde{\sigma}_{i}(\pi)$ is strictly before the $(|\sigma_{i}(x_{1})|+1)$ -st position of $\tilde{\sigma}_{i}(\pi)$ . Set $\gamma_{i,1}=\varepsilon$ . Let $\ell_{1}\mathrel{\mathop{\mathchar 58\relax}}=\left|\tilde{\sigma}_{i}(\pi){\big{|}}_{\{\delta\}}\right|$ $\left(\mbox{resp.~{}}\ell_{2}\mathrel{\mathop{\mathchar 58\relax}}=\left|\tilde{\sigma}_{i}(\pi){\big{|}}_{\{\overline{\delta}\}}\right|\right)$ , i.e. $\ell_{1}$ (resp. $\ell_{2}$ ) is the number of occurrences of $\delta$ (resp. $\overline{\delta}$ ) in $\tilde{\sigma}_{i}(\pi)$ . Note that the case assumption implies $\sigma_{i}(x_{2})\sigma_{i}(x_{3})\in\{\overline{\delta}\}^{*}$ . Suppose $\ell_{1}=2p_{1}+r_{1}$ and $\ell_{2}=2p_{2}+r_{2}$ for some $p_{1},p_{2}\geq 0$ and $r_{1},r_{2}\in\{0,1\}$ . Let $\tau_{i}$ be the pattern derived from $\tilde{\sigma}_{i}(\pi)$ as follows: if $p_{1}\geq 1$ (resp. $p_{2}\geq 1$ ), then for all $j\in\{0,\ldots,p_{1}-1\}$ (resp. $j\in\{0,\ldots,p_{2}-1\}$ ), substitute $x_{i,j}$ (resp. $y_{i,j}$ ) for the $(2j+1)$ -st and $(2j+2)$ -nd occurrences of $\delta$ (resp. $\overline{\delta}$ ) in $\tilde{\sigma}_{i}(\pi)$ , and if $r_{1}=1$ (resp. $r_{2}=1$ ), then substitute $x_{i,p_{1}}$ (resp. $y_{i,p_{2}}$ ) for the $(2p_{1}+1)$ -st (resp. $(2p_{2}+1)$ -st) occurrence of $\delta$ (resp. $\overline{\delta}$ ) in $\tilde{\sigma}_{i}(\pi)$ .

Define $\gamma_{i,2}$ to be the prefix of $\tau_{i}$ of length $|\sigma_{i}(x_{1})|$ and define $\gamma_{i,3}$ to be the suffix of $\tau_{i}$ of length $|\sigma_{i}(x_{2})\sigma_{i}(x_{3})|$ .

Case 2:

Not Case 1. Then for all $\delta\in\mbox{Const}(w_{i})$ , the position of the last occurrence of $\delta$ in $\tilde{\sigma}_{i}(\pi)$ is greater than $|\tilde{\sigma}_{i}(x_{1})|$ . Let $\ell^{\prime}_{1}\mathrel{\mathop{\mathchar 58\relax}}=\left|\tilde{\sigma}_{i}(\pi){\big{|}}_{\{0\}}\right|$ $\left(\mbox{resp.~{}}\ell^{\prime}_{2}\mathrel{\mathop{\mathchar 58\relax}}=\right.$ $\left.\left|\tilde{\sigma}_{i}(\pi){\big{|}}_{\{1\}}\right|\right)$ . Suppose $\ell^{\prime}_{1}=2q_{1}+s_{1}$ and $\ell^{\prime}_{2}=2q_{2}+s_{2}$ for some $q_{1},q_{2}\geq 0$ and $s_{1},s_{2}\in\{0,1\}$ . As in Case 1, let $\tau_{i}$ be the pattern derived from $\tilde{\sigma}_{i}(\pi)$ as follows: if $q_{1}\geq 1$ (resp. $q_{2}\geq 1$ ), then for all $j\in\{0,\ldots,q_{1}-1\}$ (resp. $j\in\{0,\ldots,q_{2}-1\}$ ), substitute $x_{i,j}$ (resp. $y_{i,j}$ ) for the $(2j+1)$ -st and $(2j+2)$ -nd occurrences of [math] (resp. $1$ ) in $\tilde{\sigma}_{i}(\pi)$ , and if $s_{1}=1$ (resp. $s_{2}=1$ ), then substitute $x_{i,q_{1}}$ (resp. $x_{i,q_{2}}$ ) for the $(2q_{1}+1)$ -st (resp. $(2q_{2}+1)$ -st) occurrence of [math] (resp. $1$ ) in $\tilde{\sigma}_{i}(\pi)$ .

Define $\gamma_{i,1}$ to be the prefix of $\tau_{i}$ of length $|\sigma_{i}(x_{1})|$ , define $\gamma_{i,2}$ to be the substring of $\tau_{i}$ of length $|\sigma_{i}(x_{2})|$ that starts at the $(|\sigma_{i}(x_{1})|+1)$ -st position of $\tau_{i}$ , and define $\gamma_{i,3}$ to be the suffix of $\tau_{i}$ of length $|\sigma_{i}(x_{3})|$ .

Set

[TABLE]

In order to derive a contradiction, it will be shown that $(a)\{w_{1},\ldots,w_{n}\}\subseteq L(\tau)$ and (b) $L(\tau)\subset L(\pi)$ .555In [31], $\tau$ is known as a passe-partout for $\pi$ and $\{w_{1},\ldots,w_{n}\}$ .

Proof of (a). For $i\in\{1,\ldots,n\}$ , let $\varphi_{i}\mathrel{\mathop{\mathchar 58\relax}}X\mapsto\Sigma^{*}$ be the morphism defined as follows. If $\sigma_{i}$ falls into Case 1, let $\delta$ be a letter as defined in Case 1 for $\sigma_{i}$ . For all $j\in{\mathbb{N}}_{0}$ , set $\varphi_{i}(x_{i,j})=\delta$ and $\varphi_{i}(y_{i,j})=\overline{\delta}$ . For all $i^{\prime}\neq i$ and $j\in{\mathbb{N}}_{0}$ , set $\varphi_{i}(x_{i^{\prime},j})=\varphi_{i}(y_{i^{\prime},j})=\varepsilon$ . It may be directly verified that $\varphi_{i}(\tau)=w_{i}$ .

Suppose $\sigma_{i}$ falls into Case 2. For all $j\in{\mathbb{N}}_{0}$ , set $\varphi_{i}(x_{i,j})=0$ and $\varphi_{i}(y_{i,j})=1$ . For all $i^{\prime}\neq i$ and $j\in{\mathbb{N}}_{0}$ , set $\varphi_{i}(x_{i^{\prime},j})=\varphi_{i}(y_{i^{\prime},j})=\varepsilon$ . Then $\varphi_{i}(\tau)=w_{i}$ .

Proof of (b). By Theorem 17, it is enough to show that there is a morphism $\psi\mathrel{\mathop{\mathchar 58\relax}}X^{*}\mapsto X^{*}$ such that $\psi(\pi)=\tau$ but there does not exist any morphism $\theta\mathrel{\mathop{\mathchar 58\relax}}X^{*}\mapsto X^{*}$ for which $\theta(\tau)=\psi$ . For the first part, define, for each $i\in\{1,2,3\}$ , the substitution $\psi(x_{i})=\gamma_{1,i}\ldots\gamma_{n,i}$ . It follows that $\psi(\pi)=\tau$ . For the second part, we first note that by construction, every variable of $\tau$ that occurs exactly twice must belong to $\mbox{Var}(\gamma_{1,2}\ldots\gamma_{n,2}\gamma_{1,3}\ldots\gamma_{n,3})$ . Consequently, for all morphisms $\theta\mathrel{\mathop{\mathchar 58\relax}}X^{*}\mapsto X^{*}$ , if $\theta(\tau)$ contains exactly three variables, each of which occurs exactly twice, then $\theta(\tau)$ is equivalent to one of the following patterns: $x_{1}x_{2}x_{1}x_{2}x_{3}^{2}$ , or $x_{1}x_{2}x_{3}x_{1}x_{2}x_{3}$ , or $x_{1}^{2}x_{2}x_{3}x_{2}x_{3}$ . Thus $\theta(\tau)$ cannot be equivalent to $\pi$ .

We conclude from (a) and (b) that $\{w_{1},\ldots,w_{n}\}$ cannot be a tell-tale for $\pi$ w.r.t. $\Pi^{2}_{\infty,4,cf}$ , contrary to assumption.

V Remark 22

We prove that $T\mathrel{\mathop{\mathchar 58\relax}}=\{(\varepsilon,+),(0^{2}1^{2}0^{2},+),(0,-),(01^{2}0,$ $-),(0^{3},-),((01)^{2}(0^{2}1)^{2}$ $(0^{3}1)^{2}$ $(0^{4}1)^{2},-)\}$ is a teaching set for $\pi\mathrel{\mathop{\mathchar 58\relax}}=x_{1}^{2}x_{2}^{2}x_{3}^{2}$ w.r.t. $\Pi^{2}_{\infty,3}$ . Let $\tau$ be any pattern in $\Pi^{2}_{\infty,3}$ that is consistent with $T$ . Since $\varepsilon\in L(\tau)$ , $\tau$ does not contain any constant symbols. The negative examples $(0,-)$ and $(0^{3},-)$ ensure that every variable of $\tau$ occurs exactly twice. The consistency of $\tau$ with $(01^{2}0,-)$ then implies that $\tau$ is a non-cross pattern, i.e., $\tau$ is equivalent to a pattern of the shape $x_{1}^{2}x_{2}^{2}\ldots x_{k}^{2}$ for some $k$ . Since $0^{2}1^{2}0^{2}\in L(\tau)$ , $k\geq 3$ . Finally, $(01)^{2}(0^{2}1)^{2}(0^{3}1)^{2}(0^{4}1)^{2}\notin L(\tau)$ implies that $k\leq 3$ . Hence $\tau$ is equivalent to $x_{1}^{2}x_{2}^{2}x_{3}^{2}$ .

Bibliography36

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] A. V. Aho. Algorithms for finding patterns in strings. In Jan van Leeuwen, editor, Handbook of Theoretical Computer Science, Vol. A: Algorithms and Complexity, chapter 5, pages 257–300. MIT Press, Oxford, 1990.
2[2] A. Amir and I. Nor. Generalized function matching. J. Disc. Algo., 5(3):514–523, 2007.
3[3] D. Angluin. Finding patterns common to a set of strings. J. Comput. Syst. Sci., 21:46–62, 1980.
4[4] D. Angluin. Inductive inference of formal languages from positive data. Information and Control , 45(2):117–135, 1980.
5[5] D. Angluin, J. Aspnes, S. Eisenstat and A. Kontorovich. On the learnability of shuffle ideals. J. Mach. Learn. Res., 14:1513–1531, 2013.
6[6] B. S. Baker. Parameterized pattern matching: Algorithms and applications. J. Comput. Syst. Sci., 52(1):28–42, 1996.
7[7] F. Bayeh, Z. Gao and S. Zilles. Erasing pattern languages distinguishable by a finite number of strings. In ALT, pages 72–108, 2017.
8[8] F. Bayeh, Z. Gao, and S. Zilles. Erasing pattern languages distinguishable by a finite number of strings, 2018. Manuscript under review.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

The Teaching Complexity of Erasing Pattern Languages With Bounded Variable Frequency

Abstract

1 Introduction

2 Preliminaries

3 Teaching Dimension and Preference-based Teaching Dimension

4 Simple Block-Regular Patterns

Notation 1

Lemma 2

Theorem 3

Theorem 4

Lemma 5

Lemma 6

Theorem 7

5 Finite Distinguishability of mmm-Quasi-Regular, Non-Cross mmm-Regular and mmm-Regular Patterns

Notation 8

Lemma 9

Theorem 10

Theorem 11

Proposition 12

Lemma 13

Theorem 14

Example 15

Theorem 16

Theorem 17

Lemma 18

Lemma 19

Theorem 20

Theorem 21

Remark 22

6 Conclusion

Appendix

A Additional Definitions and Notation

Example A.1

Lemma A.2

B Example of the Mappings I{\mathcal{I}}I and I‾\overline{{\mathcal{I}}}I

Example B.1

C Proof of Lemma 5

D Example for Lemma 5

Example D.1

E Proof of Lemma 6

Claim E.1

Claim E.2

F Example for Lemma 6

Example F.1

G Proof of Theorem 7

H Example for Theorem 7

Example H.1

I Proof of Lemma 9

Claim I.1

J Proof of Theorem 10

K Proof of Theorem 11

Lemma K.1

Claim K.2

Remark K.3

L Proof of Proposition 12

M Proof of Lemma 13

N Proof of Theorem 14

O Example 15

P Remark on Theorem 14

Q Proof of Theorem 16

R Proof of Lemma 18

S Proof of Lemma 19

T Proof of Theorem 20

Claim T.1

U Proof of Theorem 21

V Remark 22

5 Finite Distinguishability of $m$ -Quasi-Regular, Non-Cross $m$ -Regular and $m$ -Regular Patterns

B Example of the Mappings ${\mathcal{I}}$ and $\overline{{\mathcal{I}}}$