Pushing for weighted tree automata

Thomas Hanneforth; Andreas Maletti; Daniel Quernheim

arXiv:1702.00304·cs.FL·June 22, 2023

Pushing for weighted tree automata

Thomas Hanneforth, Andreas Maletti, Daniel Quernheim

PDF

Open Access

TL;DR

This paper introduces a weight normalization procedure called pushing for weighted tree automata over commutative semifields, enabling efficient minimization and equivalence testing, especially for bottom-up deterministic automata.

Contribution

It presents a novel pushing normalization method for weighted tree automata that preserves recognized languages and improves equivalence testing efficiency.

Findings

01

Normalization preserves recognized weighted tree languages.

02

New equivalence test runs in near-linear time, improving over previous methods.

03

Reduction to unweighted automata simplifies minimization and testing.

Abstract

A weight normalization procedure, commonly called pushing, is introduced for weighted tree automata (wta) over commutative semifields. The normalization preserves the recognized weighted tree language even for nondeterministic wta, but it is most useful for bottom-up deterministic wta, where it can be used for minimization and equivalence testing. In both applications a careful selection of the weights to be redistributed followed by normalization allows a reduction of the general problem to the corresponding problem for bottom-up deterministic unweighted tree automata. This approach was already successfully used by Mohri and Eisner for the minimization of deterministic weighted string automata. Moreover, the new equivalence test for two wta $M$ and $M^{'}$ runs in time $O ((∣ M ∣ + ∣ M^{'} ∣) \cdot lo g (∣ Q ∣ + ∣ Q^{'} ∣))$ , where $Q$ and $Q^{'}$ are…

Equations69

∣ M ∣ = t \to q \in supp (μ) \sum (∣ t ∣ + 1) .

∣ M ∣ = t \to q \in supp (μ) \sum (∣ t ∣ + 1) .

h_{μ} (p \to q)

h_{μ} (p \to q)

h_{μ} (σ (t_{1}, \dots, t_{k}) \to q)

h_{\mu}^{(2)}(c[q_{1}])\cdot\chi_{F}\bigl{(}h_{\mu}^{(1)}(c[q_{1}])\bigr{)}=s\cdot h_{\mu}^{(2)}(c[q_{2}])\cdot\chi_{F}\bigl{(}h_{\mu}^{(1)}(c[q_{2}])\bigr{)}\enspace,

h_{\mu}^{(2)}(c[q_{1}])\cdot\chi_{F}\bigl{(}h_{\mu}^{(1)}(c[q_{1}])\bigr{)}=s\cdot h_{\mu}^{(2)}(c[q_{2}])\cdot\chi_{F}\bigl{(}h_{\mu}^{(1)}(c[q_{2}])\bigr{)}\enspace,

λ (q) = 1 = h_{μ}^{(2)} (q) = h_{μ}^{(2)} (□ [q])

λ (q) = 1 = h_{μ}^{(2)} (q) = h_{μ}^{(2)} (□ [q])

λ (q)

λ (q)

= h_{μ}^{(2)} (sol (B) [c [q]]) = h_{μ}^{(2)} ((sol (B) [c]) [q]),

λ (q_{1}) = λ (q_{f}) = 1 λ (q_{2}) = 2 and λ (q_{b}) = 8 .

λ (q_{1}) = λ (q_{f}) = 1 λ (q_{2}) = 2 and λ (q_{b}) = 8 .

μ^{'} (σ (q_{1}, \dots, q_{k}) \to q) = λ (q) \cdot μ (σ (q_{1}, \dots, q_{k}) \to q) \cdot i = 1 \prod k λ (q_{i})^{- 1} .

μ^{'} (σ (q_{1}, \dots, q_{k}) \to q) = λ (q) \cdot μ (σ (q_{1}, \dots, q_{k}) \to q) \cdot i = 1 \prod k λ (q_{i})^{- 1} .

h_{μ^{'}} (t \to q)

h_{μ^{'}} (t \to q)

\displaystyle=\sum_{q_{1},\dotsc,q_{k}\in Q}\lambda(q)\cdot\mu(\sigma(q_{1},\dotsc,q_{k})\to q)\cdot\prod_{i=1}^{k}\lambda(q_{i})^{-1}\cdot\prod_{i=1}^{k}\Bigl{(}\lambda(q_{i})\cdot h_{\mu}(t_{i}\to q_{i})\Bigr{)}

= λ (q) \cdot h_{μ} (t \to q) .

M^{'} (t)

M^{'} (t)

λ (q_{2}) \cdot μ (σ (q_{b}, q_{f}) \to q_{2}) \cdot λ (q_{b})^{- 1} \cdot λ (q_{f})^{- 1} = 2 \cdot 4 \cdot 8^{- 1} \cdot 1^{- 1} = 1 .

λ (q_{2}) \cdot μ (σ (q_{b}, q_{f}) \to q_{2}) \cdot λ (q_{b})^{- 1} \cdot λ (q_{f})^{- 1} = 2 \cdot 4 \cdot 8^{- 1} \cdot 1^{- 1} = 1 .

μ^{''} (⟨ σ, s ⟩ (q_{1}, \dots, q_{k}) \to q) = 1 ⟺ μ (σ (q_{1}, \dots, q_{k}) \to q) = s .

μ^{''} (⟨ σ, s ⟩ (q_{1}, \dots, q_{k}) \to q) = 1 ⟺ μ (σ (q_{1}, \dots, q_{k}) \to q) = s .

μ^{(1)} (σ (q_{1}, \dots, q_{k})) \equiv_{M} μ^{(1)} (σ (q_{1}^{'}, \dots, q_{k}^{'}))

μ^{(1)} (σ (q_{1}, \dots, q_{k})) \equiv_{M} μ^{(1)} (σ (q_{1}^{'}, \dots, q_{k}^{'}))

(μ^{'})^{(2)} (σ (q_{1}, \dots, q_{k})) = s = (μ^{'})^{(2)} (σ (q_{1}^{'}, \dots, q_{k}^{'})),

(μ^{'})^{(2)} (σ (q_{1}, \dots, q_{k})) = s = (μ^{'})^{(2)} (σ (q_{1}^{'}, \dots, q_{k}^{'})),

\equiv_{M} (μ^{''})^{(1)} (⟨ σ, s ⟩ (q_{1}, \dots, q_{k}))

\equiv_{M} (μ^{''})^{(1)} (⟨ σ, s ⟩ (q_{1}, \dots, q_{k}))

\equiv_{M} μ^{(1)} (σ (q_{1}^{'}, \dots, q_{k}^{'}))

(μ^{'})^{(2)} (σ (q_{1}, \dots, q_{k})) = (μ^{'})^{(2)} (σ (q_{1}^{'}, \dots, q_{k}^{'}))

(μ^{'})^{(2)} (σ (q_{1}, \dots, q_{k})) = (μ^{'})^{(2)} (σ (q_{1}^{'}, \dots, q_{k}^{'}))

(μ^{'})^{(2)} (σ (q_{1}, \dots, q_{k}))

(μ^{'})^{(2)} (σ (q_{1}, \dots, q_{k}))

(μ^{'})^{(2)} (σ (q_{1}^{'}, \dots, q_{k}^{'}))

= λ (μ^{(1)} (c_{j} [q_{j}])) \cdot μ^{(2)} (c_{j} [q_{j}]) \cdot i = 1 \prod j - 1 λ (q_{i}^{'})^{- 1} \cdot i = j \prod k λ (q_{i})^{- 1}

= λ (μ^{(1)} (c_{j} [q_{j}])) \cdot μ^{(2)} (c_{j} [q_{j}]) \cdot i = 1 \prod j - 1 λ (q_{i}^{'})^{- 1} \cdot i = j \prod k λ (q_{i})^{- 1}

= λ (μ^{(1)} (c_{j} [q_{j}^{'}])) \cdot μ^{(2)} (c_{j} [q_{j}^{'}]) \cdot i = 1 \prod j λ (q_{i}^{'})^{- 1} \cdot i = j + 1 \prod k λ (q_{i})^{- 1}

\frac{λ ( q _{j} )}{λ ( q _{j}^{'} )}

\frac{λ ( q _{j} )}{λ ( q _{j}^{'} )}

\frac{λ ( p _{j} )}{λ ( p _{j}^{'} )}

= \frac{λ ( μ ^{(1)} ( c _{j} [ q _{j} ])) \cdot μ ^{(2)} ( c _{j} [ q _{j} ]) \cdot \prod _{i = 1}^{j - 1} λ ( q _{i}^{'} ) ^{- 1} \cdot \prod _{i = j}^{k} λ ( q _{i} ) ^{- 1}}{λ ( μ ^{(1)} ( c _{j} [ q _{j}^{'} ])) \cdot μ ^{(2)} ( c _{j} [ q _{j}^{'} ]) \cdot \prod _{i = 1}^{j} λ ( q _{i}^{'} ) ^{- 1} \cdot \prod _{i = j + 1}^{k} λ ( q _{i} ) ^{- 1}}

= \frac{λ ( μ ^{(1)} ( c _{j} [ q _{j} ])) \cdot μ ^{(2)} ( c _{j} [ q _{j} ]) \cdot \prod _{i = 1}^{j - 1} λ ( q _{i}^{'} ) ^{- 1} \cdot \prod _{i = j}^{k} λ ( q _{i} ) ^{- 1}}{λ ( μ ^{(1)} ( c _{j} [ q _{j}^{'} ])) \cdot μ ^{(2)} ( c _{j} [ q _{j}^{'} ]) \cdot \prod _{i = 1}^{j} λ ( q _{i}^{'} ) ^{- 1} \cdot \prod _{i = j + 1}^{k} λ ( q _{i} ) ^{- 1}}

= \frac{λ ( p _{j} ) \cdot μ ^{(2)} ( c _{j} [ q _{j} ]) \cdot λ ( q _{j} ) ^{- 1}}{λ ( p _{j}^{'} ) \cdot μ ^{(2)} ( c _{j} [ q _{j}^{'} ]) \cdot λ ( q _{j}^{'} ) ^{- 1}} = \eqref e q : m 2 \frac{h _{μ}^{(2)} ( c [ p _{j} ]) \cdot μ ^{(2)} ( c _{j} [ q _{j} ])}{h _{μ}^{(2)} ( c [ p _{j}^{'} ]) \cdot μ ^{(2)} ( c _{j} [ q _{j}^{'} ])} \cdot \frac{λ ( q _{j}^{'} )}{λ ( q _{j} )} = \eqref e q : m 1 1

(μ^{'})^{(2)} (σ (q_{1}, \dots, q_{k}))

(μ^{'})^{(2)} (σ (q_{1}, \dots, q_{k}))

= \eqref e q : m 3 λ (μ^{(1)} (c_{1} [q_{1}])) \cdot μ^{(2)} (c_{1} [q_{1}]) \cdot i = 1 \prod 0 λ (q_{i}^{'})^{- 1} \cdot i = 1 \prod k λ (q_{i})^{- 1}

= \eqref e q : mm λ (μ^{(1)} (c_{1} [q_{1}^{'}])) \cdot μ^{(2)} (c_{1} [q_{1}^{'}]) \cdot i = 1 \prod 1 λ (q_{i}^{'})^{- 1} \cdot i = 2 \prod k λ (q_{i})^{- 1}

= \eqref e q : m 3 λ (μ^{(1)} (c_{2} [q_{2}])) \cdot μ^{(2)} (c_{2} [q_{2}]) \cdot i = 1 \prod 1 λ (q_{i}^{'})^{- 1} \cdot i = 2 \prod k λ (q_{i})^{- 1}

\dots

= \eqref e q : mm λ (μ^{(1)} (c_{k} [q_{k}^{'}])) \cdot μ^{(2)} (c_{k} [q_{k}^{'}]) \cdot i = 1 \prod k λ (q_{i}^{'})^{- 1} \cdot i = k + 1 \prod k λ (q_{i})^{- 1}

= \eqref e q : m 4 λ (μ^{(1)} (σ (q_{1}^{'}, \dots, q_{k}^{'}))) \cdot μ^{(2)} (σ (q_{1}^{'}, \dots, q_{k}^{'})) \cdot i = 1 \prod k λ (q_{i}^{'})^{- 1}

= \eqref e q : m 4 (μ^{'})^{(2)} (σ (q_{1}^{'}, \dots, q_{k}^{'})),

⟺ q \sim_{M} p

⟺ q \sim_{M} p

⟺ {c \in C_{Σ} (Q) ∣ h_{μ}^{(1)} (c [q]) \in F} = {c \in C_{Σ} (Q) ∣ h_{μ}^{(1)} (c [p]) \in F}

⟺ {c \in C_{Σ} ∣ h_{μ}^{(1)} (c [q]) \in F} = {c \in C_{Σ} ∣ h_{μ}^{(1)} (c [p]) \in F}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topicssemigroups and automata theory · Natural Language Processing Techniques · Machine Learning and Algorithms

Full text

\lmcsheading

1–LABEL:LastPageFeb. 02, 2017Jan. 16, 2018

\ACMCCS[Theory of computation]: Formal languages and automata theory — Tree languages; [Theory of computation]: Formal languages and automata — Automata extensions — Quantitative automata \amsclass68Q45, 68Q25

\titlecomment\lsuper

*This is a revised and extended version of [Maletti, Quernheim: Pushing for weighted tree automata. Proc. 36th Int. Conf. Mathematical Foundations of Computer Science, LNCS 6907, p. 460–471, 2011].

Pushing for weighted tree automata\rsuper*

— dedicated to the memory of Zoltán Ésik (1951–2016) —

Thomas Hanneforth

Universität Potsdam, Human Sciences Faculty, Department Linguistik Karl-Liebknecht-Str. 24–25, 14476 Potsdam, Germany

[email protected]

,

Andreas Maletti

Universität Leipzig, Faculty of Mathematics and Computer Science, Institute of Computer Science PO box 100 920, 04009 Leipzig, Germany

[email protected]

and

Daniel Quernheim

Universität Stuttgart, Institute for Natural Language Processing Pfaffenwaldring 5b, 70569 Stuttgart, Germany

[email protected]

Abstract.

A weight normalization procedure, commonly called pushing, is introduced for weighted tree automata (wta) over commutative semifields. The normalization preserves the recognized weighted tree language even for nondeterministic wta, but it is most useful for bottom-up deterministic wta, where it can be used for minimization and equivalence testing. In both applications a careful selection of the weights to be redistributed followed by normalization allows a reduction of the general problem to the corresponding problem for bottom-up deterministic unweighted tree automata. This approach was already successfully used by Mohri and Eisner for the minimization of deterministic weighted string automata. Moreover, the new equivalence test for two wta $M$ and $M^{\prime}$ runs in time $\mathcal{O}\bigl{(}(\lvert M\rvert+\lvert M^{\prime}\rvert)\log{(\lvert Q\rvert+\lvert Q^{\prime}\rvert)}\bigr{)}$ , where $Q$ and $Q^{\prime}$ are the states of $M$ and $M^{\prime}$ , respectively, which improves the previously best run-time $\mathcal{O}\bigl{(}\lvert M\rvert\cdot\lvert M^{\prime}\rvert\bigr{)}$ .

Key words and phrases:

pushing — weighted tree automaton — minimization — equivalence testing

Financially supported by the German Research Foundation (DFG) grant MA / 4959 / 1-1.

1. Introduction

Weighted tree automata [FV09] have recently found various applications in fields as diverse as natural language and XML processing [KM09], system verification [Jac11], and pattern recognition. Most applications require efficient algorithms for basic manipulations of tree automata such as determinization [BMV10], inference [MKV10], and minimization [HMM09, HMM07]. For example, in the system verification domain the properties to be verified are typically easily expressed as a formula in a logic. It is well-known [TW68] that tree automata are as expressive as monadic second-order logic with two successors. This celebrated result was recently generalized to the weighted setting for various weight structures [DV06, Man08, DGMM11, VDH16], so quantitative specifications are readily available. However, one of the main insights gained in the development of the mona toolkit [KM01] (or the spass system [WDF*+*09]) was that the transformation of a formula into an equivalent tree automaton heavily relies on the minimization of the constructed deterministic tree automata as the automata otherwise grow far too quickly. Similarly, a major inference setup, also used in the synthesis subfield in system verification, is Angluin’s minimally adequate teacher setup [Ang87]. In this setup, the learner is given access to an oracle that correctly supplies coefficients of trees in the weighted tree language to be learned, which are called coefficient queries, and certificates that the proposed weighted tree automaton indeed represents the weighted tree language to be learned, which are called equivalence queries. In implementations of the oracle the latter queries are typically answered by equivalence tests.

As already mentioned, quantitative models have recently enjoyed a lot of attention. For example, in natural language processing, weighted devices are often used to model probabilities, cost functions, or other features. In this contribution, we consider pushing [Moh97, Eis03] for weighted tree automata [BR82, FV09] over commutative semifields [HW98, Gol99]. Roughly speaking, pushing moves transition weights along a path. If the weights are properly selected, then pushing can be used to canonicalize a (bottom-up) deterministic weighted tree automaton [Bor05]. The obtained canonical representation has the benefit that it can be minimized using unweighted minimization, in which the weight is treated as a transition label. This strategy has successfully been employed in [Moh97, Eis03] for deterministic weighted (finite-state) string automata, and similar approaches have been used to minimize sequential transducers [Cho03] and bottom-up tree transducers [FSM11]. Here we adapt the strategy for tree automata. In particular, we improve the currently best minimization algorithm [Mal09] for a deterministic weighted tree automaton $M$ with states $Q$ from $\mathcal{O}\bigl{(}\lvert M\rvert\cdot\lvert Q\rvert\bigr{)}$ to $\mathcal{O}\bigl{(}\lvert M\rvert\log{\lvert Q\rvert}\bigr{)}$ , which coincides with the complexity of minimization in the unweighted case [HMM09]. The improvement is achieved by a careful selection of the signs of life [Mal09]. Intuitively, a sign of life for a state $q$ is a context that takes $q$ into a final state. In [Mal09] the signs of life are computed by a straightforward exploration algorithm, which is very efficient, but does not guarantee that states that are later checked for equivalence receive the same sign of life. During the (pair-wise) equivalence checks in [Mal09] the evaluation of the weight of a state in the sign of life of another state thus becomes unavoidable, which causes the increased complexity. In this contribution, we precompute an equivalence relation, which, in general, is still coarser than the state equivalence to be determined, but equivalent states in this equivalence permit the same sign of life. Then we determine a sign of life for each equivalence class. Later we only refine this equivalence relation to obtain the state equivalence, so each state will only be evaluated in its sign of life and this evaluation can be precomputed. Moreover, the weights obtained in this evaluation, also called pushing weights, allow a proper canonicalization in the sense that equivalent states will have exactly the same weights on corresponding transitions after pushing. This property sets our algorithm apart from Algorithm 1 of [Mal09] and allows us to rely on unweighted minimization [HMM09]. Our pushing procedure, which is defined for general (potentially nondeterministic) weighted tree automata, always preserves the semantics, so it might also be useful in other setups.

Secondly, we apply pushing to the problem of testing equivalence. The currently fastest algorithm [DHM11] for checking equivalence of two deterministic weighted tree automata $M$ and $M^{\prime}$ runs in time $\mathcal{O}\bigl{(}\lvert M\rvert\cdot\lvert M^{\prime}\rvert\bigr{)}$ . It is well known that two minimal deterministic weighted tree automata $M$ and $M^{\prime}$ are equivalent if and only if they can be obtained from each other by a pushing operation (with proper pushing weights). In other words, equivalent automata $M$ and $M^{\prime}$ have the same transition structure, but their transition weights can differ by consistent factors. We extend our approach to minimization also to equivalence testing, so we again carefully determine the pushing weight and the sign of life of each state $q$ of $M$ such that it shares the sign of life with all equivalent states of $M$ but also with all corresponding states in $M^{\prime}$ . This allows us to minimize both input automata and then treat the obtained automata as unweighted automata and test them for isomorphism. This approach reduces the run-time complexity to $\mathcal{O}\bigl{(}(\lvert M\rvert+\lvert M^{\prime}\rvert)\log{(\lvert Q\rvert+\lvert Q^{\prime}\rvert)}\bigr{)}$ , where $Q$ and $Q^{\prime}$ are the states of $M$ and $M^{\prime}$ , respectively.

2. Preliminaries

We write ${\rm Nature}$ for the set of all nonnegative integers and $[1,u]$ for its subset $\{i\mid 1\leq i\leq u\}$ given $u\in{\rm Nature}$ . The $k$ -fold Cartesian product of a set $Q$ is written as $Q^{k}$ , and the empty tuple $()\in Q^{0}$ is often written as $\varepsilon$ . Every finite and nonempty set is also called alphabet, of which the elements are called symbols. A ranked alphabet $(\Sigma,\mathord{\operatorname{rk}})$ consists of an alphabet $\Sigma$ and a mapping $\mathord{\operatorname{rk}}\colon\Sigma\to{\rm Nature}$ , which assigns a rank to each symbol. If the ranking ‘ $\operatorname{rk}$ ’ is obvious from the context, then we simply write $\Sigma$ for the ranked alphabet. For each $k\in{\rm Nature}$ , we let $\Sigma_{k}$ be the set $\{\sigma\in\Sigma\mid\operatorname{rk}(\sigma)=k\}$ of $k$ -ary symbols of $\Sigma$ . Moreover, we let $\Sigma(Q)=\{\sigma w\mid\sigma\in\Sigma,\,w\in Q^{\operatorname{rk}(\sigma)}\}$ . The set $T_{\Sigma}(Q)$ of all $\Sigma$ -trees indexed by $Q$ is inductively defined to be the smallest set $T$ such that $Q\subseteq T$ and $\Sigma(T)\subseteq T$ . Instead of $T_{\Sigma}(\emptyset)$ we simply write $T_{\Sigma}$ . The size $\lvert t\rvert$ of a tree $t\in T_{\Sigma}(Q)$ is inductively defined by $\lvert q\rvert=1$ for every $q\in Q$ and $\lvert\sigma(t_{1},\dotsc,t_{k})\rvert=1+\sum_{i=1}^{k}\lvert t_{i}\rvert$ for every $k\in{\rm Nature}$ , $\sigma\in\Sigma_{k}$ , and $t_{1},\dotsc,t_{k}\in T_{\Sigma}(Q)$ . To increase readability, we often omit quantifications like “for all $k\in{\rm Nature}$ ” if they are obvious from the context.

We reserve the use of a special symbol ${\scriptstyle\Box}$ that is not an element in any considered alphabet. Its function is to mark a designated position in certain trees called contexts. Formally, the set $C_{\Sigma}(Q)$ of all $\Sigma$ -contexts indexed by $Q$ is defined as the smallest set $C$ such that ${\scriptstyle\Box}\in C$ and $\sigma(t_{1},\dotsc,t_{i-1},c,t_{i+1},\dotsc,t_{k})\in C$ for every $\sigma\in\Sigma_{k}$ , $t_{1},\dotsc,t_{k}\in T_{\Sigma}(Q)$ , $i\in[1,k]$ , and $c\in C$ . As before, we simplify $C_{\Sigma}(\emptyset)$ to $C_{\Sigma}$ . In simple words, a context is a tree, in which the special symbol ${\scriptstyle\Box}$ occurs exactly once and at a leaf position. Note that $C_{\Sigma}(Q)\cap T_{\Sigma}(Q)=\emptyset$ , but $C_{\Sigma}(Q)\subseteq T_{\Sigma}(Q\cup\{{\scriptstyle\Box}\})$ , which allows us to treat contexts like trees. Given $c\in C_{\Sigma}(Q)$ and $t\in T_{\Sigma}(Q\cup\{{\scriptstyle\Box}\})$ , the tree $c[t]$ is obtained from $c$ by replacing the unique occurrence of ${\scriptstyle\Box}$ in $c$ by $t$ . In particular, $c[c^{\prime}]\in C_{\Sigma}(Q)$ given that $c,c^{\prime}\in C_{\Sigma}(Q)$ .

A commutative semiring [HW98, Gol99] is a tuple $(S,\mathord{+},\mathord{\cdot},0,1)$ such that $(S,\mathord{+},0)$ and $(S,\mathord{\cdot},1)$ are commutative monoids and $s\cdot 0=0$ and $s\cdot(s_{1}+s_{2})=(s\cdot s_{1})+(s\cdot s_{2})$ for all $s,s_{1},s_{2}\in S$ (i.e., $\cdot$ distributes over $+$ ). It is a commutative semifield if $(S\setminus\{0\},\mathord{\cdot},1)$ is a commutative group (i.e., in addition, for every $s\in S\setminus\{0\}$ there exists $s^{-1}\in S$ such that $s\cdot s^{-1}=1$ ). Typical commutative semifields include

•

the Boolean semifield $\mathbb{B}=(\{0,1\},\mathord{\max},\mathord{\min},0,1)$ ,

•

the field $(\mathbb{Q},\mathord{+},\mathord{\cdot},0,1)$ of rational numbers, and

•

the Viterbi semifield $(\mathbb{Q}_{\geq 0},\mathord{\max},\mathord{\cdot},0,1)$ , where $\mathbb{Q}_{\geq 0}=\{q\in\mathbb{Q}\mid q\geq 0\}$ .

Given a mapping $f\colon A\to S$ , we write $\operatorname{supp}(f)$ for the set $\{a\in A\mid f(a)\neq 0\}$ of elements that are mapped via $f$ to a non-zero semiring element.

*For the rest of the paper, let $(S,\mathord{+},\mathord{\cdot},0,1)$ be a commutative semifield.*111Clearly, weighted tree automata can also be defined for semirings or even more general weight structures, but already minimization for deterministic finite-state string automata becomes NP-hard for simple semirings that are not semifields (see [Eis03, Section 3]).

A weighted tree automaton [BLB83, Boz99, Kui98, BV03, Bor05, FV09] (for short: wta) is a tuple $M=(Q,\Sigma,\mu,F)$ , in which

•

$Q$ is an alphabet of states,

•

$\Sigma$ is a ranked alphabet of input symbols,

•

$\mu\colon\Sigma(Q)\times Q\to S$ assigns a weight to each transition, and

•

$F\subseteq Q$ is a set of final states.

We often write elements of $T_{\Sigma}(Q)\times Q$ as $t\to q$ instead of $(t,q)$ . The size $\lvert M\rvert$ of the wta $M$ is

[TABLE]

We extend the transition weight assignment $\mu$ to a mapping $h_{\mu}\colon T_{\Sigma}(Q)\times Q\to S$ by

[TABLE]

for all $p,q\in Q$ , $\sigma\in\Sigma_{k}$ , and $t_{1},\dotsc,t_{k}\in T_{\Sigma}(Q)$ . The wta $M$ recognizes the weighted tree language $M\colon T_{\Sigma}\to S$ such that $M(t)=\sum_{q\in F}h_{\mu}(t\to q)$ for every $t\in T_{\Sigma}$ . Two wta $M$ and $M^{\prime}$ are equivalent if their recognized weighted tree languages coincide. The unweighted (finite-state) tree automaton [GS84, GS97, CDG*+*07] (for short: fta) corresponding to $M$ is $\operatorname{unw}(M)=(Q,\Sigma,\operatorname{supp}(\mu),F)$ .222An fta computes in the same manner as a wta over the Boolean semifield $\mathbb{B}$ . We note that $\operatorname{supp}(M)\subseteq L(\operatorname{unw}(M))$ , where $L(\operatorname{unw}(M))$ is the tree language recognized by the fta $\operatorname{unw}(M)$ .

The wta $M=(Q,\Sigma,\mu,F)$ is (bottom-up) deterministic (or a dwta) if for every $t\in\Sigma(Q)$ there exists at most one $q\in Q$ such that $t\to q\in\operatorname{supp}(\mu)$ . In other words, a wta $M$ is deterministic if and only if $\operatorname{unw}(M)$ is bottom-up deterministic. In a dwta we can (without loss of information) treat $\mu$ and $h_{\mu}$ as partial mappings $\mu\colon\Sigma(Q)\dasharrow Q\times S$ and $h_{\mu}\colon T_{\Sigma}(Q)\dasharrow Q\times S$ . We use $\mu^{(1)}$ and $\mu^{(2)}$ as well as $h_{\mu}^{(1)}$ and $h_{\mu}^{(2)}$ for the corresponding projections to the first and second output component, respectively (e.g., $\mu^{(1)}\colon\Sigma(Q)\dasharrow Q$ and $\mu^{(2)}\colon\Sigma(Q)\dasharrow S$ ). To avoid complicated distinctions, we treat undefinedness like a value (i.e., it is equal to itself, but different from every other value). We observe that $\operatorname{supp}(M)=L(\operatorname{unw}(M))$ for each dwta $M$ .333The statement holds because each commutative semifield is zero-divisor free [Bor03, Lemma 1]. Moreover, the restriction to final states instead of final weights in the definition of a wta does not restrict the expressive power [Bor05, Lemma 6.1.4], which applies to both wta and dwta. In addition, the transformation of a wta with final weights into an equivalent wta with final states does not add additional states, so all our results also apply to wta with final weights.

An equivalence relation $\equiv$ on a set $A$ is a reflexive, symmetric, and transitive subset of $A^{2}$ . The equivalence class (or block) $[a]_{\mathord{\equiv}}$ of the element $a\in A$ is $\{a^{\prime}\in A\mid a\equiv a^{\prime}\}$ , and we let $(A^{\prime}/\mathord{\equiv})=\{[a^{\prime}]_{\mathord{\equiv}}\mid a^{\prime}\in A^{\prime}\}$ for every $A^{\prime}\subseteq A$ . Whenever $\equiv$ is obvious from the context, we simply omit it. The equivalence $\equiv$ respects a set $A^{\prime}\subseteq A$ if $[a]\subseteq A^{\prime}$ or $[a]\subseteq A\setminus A^{\prime}$ for every $a\in A$ (i.e., each equivalence class is either completely in $A^{\prime}$ or completely outside $A^{\prime}$ ).

Let $M=(Q,\Sigma,\mu,F)$ be a dwta. An equivalence relation $\mathord{\equiv}\subseteq Q^{2}$ is a congruence (of $M$ ) if $\mu^{(1)}(\sigma(q_{1},\dotsc,q_{k}))\equiv\mu^{(1)}(\sigma(q^{\prime}_{1},\dotsc,q^{\prime}_{k}))$ for every $\sigma\in\Sigma_{k}$ and all equivalent states $q_{1}\equiv q^{\prime}_{1},\dotsc,q_{k}\equiv q^{\prime}_{k}$ . Note that this definition of congruence completely disregards the weights, which yields that $\equiv$ is a congruence for $M$ if and only if $\equiv$ is a congruence for $\operatorname{unw}(M)$ . Two states $q_{1},q_{2}\in Q$ are weakly equivalent, written as $q_{1}\sim_{M}q_{2}$ , if $h_{\mu}^{(1)}(c[q_{1}])\in F$ if and only if $h_{\mu}^{(1)}(c[q_{2}])\in F$ for all contexts $c\in C_{\Sigma}(Q)$ . In other words, weak equivalence coincides with classical equivalence [GS84, Definition II.6.8] for $\operatorname{unw}(M)$ . Consequently, the weak equivalence relation $\sim_{M}$ is actually a congruence of $M$ that respects $F$ [GS84, Theorem II.6.10]. The weak equivalence relation $\sim_{M}$ can be computed in time $\mathcal{O}\bigl{(}\lvert M\rvert\log{\lvert Q\rvert}\bigr{)}$ [HMM09]. Finally, two states are (strongly) equivalent, written as $q_{1}\equiv_{M}q_{2}$ if there exists a factor $s\in S\setminus\{0\}$ such that for all $c\in C_{\Sigma}(Q)$ we have

[TABLE]

where $\chi_{F}\colon Q\to\{0,1\}$ is the characteristic function of $F$ ; i.e., $F(q)=1$ if and only if $q\in F$ for all $q\in Q$ . The equivalence relation $\equiv_{M}$ is called the Myhill-Nerode equivalence relation [Mal09, Definition 3]. It is also a congruence that respects $F$ [Mal09, Lemma 4]. If $M$ is clear from the context, then we just write $\equiv$ instead of $\equiv_{M}$ .

3. Signs of life

First, we demonstrate how to efficiently compute signs of life (Definition 3), which are evidence that a final state can be reached. Together with these signs of life we also compute a pushing weight for each state (Section 4). Our Algorithm 1 is a straightforward extension of [Mal09, Algorithm 1] that computes on equivalence classes of states (with respect to a congruence that respects finality) instead of states.444Note that our algorithm is not simply the previous algorithm executed on the quotient dwta with respect to the congruence. The original dwta is used essentially in the computation of the pushing weights. This change guarantees that equivalent states receive the same sign of life, which is an essential requirement for the algorithms in Sections 5 and 6.

Before we start we need to recall the definition of a sign of life [Mal09]. In addition, we recall the relevant properties that we use in our algorithm. For the rest of this section, let $M=(Q,\Sigma,\mu,F)$ be a dwta.

{defi}

[[Mal09, Section 2]]

A context $c\in C_{\Sigma}(Q)$ is a sign of life for the state $q\in Q$ if $h_{\mu}^{(1)}(c[q])\in F$ . Any state that has a sign of life is live; otherwise it is dead.

The following lemma justifies that we can compute signs of life for equivalence classes of congruences that respect $F$ instead of individual states since all states of such an equivalence class share the same signs of life.

Lemma 1 (see [Mal09, Lemma 9]).

We have $\mathord{\cong}\subseteq\mathord{\sim_{M}}$ for every congruence $\cong$ that respects $F$ . In particular, $\mathord{\equiv_{M}}\subseteq\mathord{\sim_{M}}$ . Moreover, every sign of life for $q\in Q$ is also a sign of life for every $q^{\prime}\in[q]_{\cong}$ .

Proof 3.1.

It is known that $\sim_{M}$ is the coarsest congruence that respects $F$ [GS84, Theorem II.6.10].555Mind that $\sim_{M}$ coincides with classical equivalence on $\operatorname{unw}(M)$ and that our notion of congruence completely disregards the weights. Consequently, $\mathord{\cong}\subseteq\mathord{\sim_{M}}$ and $\mathord{\equiv_{M}}\subseteq\mathord{\sim_{M}}$ since we already remarked that $\equiv_{M}$ is also a congruence that respects $F$ . Based on the definition of $\sim_{M}$ it is trivial to see that all elements of an equivalence class of $\sim_{M}$ share the same signs of life [Mal09, Lemma 9]. Since $[q]_{\cong}\subseteq[q]_{\sim_{M}}$ we obtain the desired statement.

Algorithm 1 simply attempts to reach all states from the final states computing a context that takes the state to a final state (i.e., a sign of life) as well as its weight in the process. Due to Lemma 1 the signs of life are computed for equivalence classes (or blocks) instead of individual states. Now let us explain Algorithm 1 in detail. Every final state $q\in F$ is trivially live as evidenced by the trivial sign of life ${\scriptstyle\Box}$ . Since the congruence $\cong$ respects $F$ , the set $(F/\mathord{\cong})$ contains equivalence classes that contain only final states. We set the sign of life for each class to ${\scriptstyle\Box}$ [see Line 3], and for each involved state $q$ we set its pushing weight to $1$ [see Line 4]. Overall, this initialization takes time $\mathcal{O}\bigl{(}\lvert F\rvert\bigr{)}$ . Next, we add all those blocks to the live states $L$ and to the blocks $U$ yet to be explored. As long as there are still unexplored blocks, we select a block $B$ from $U$ and remove it from $U$ . Then we consider all transitions that end in a state that belongs to the block $B$ and check whether it contains a source state that is not yet present in $L$ . For each such source state $q_{i}$ , we add its equivalence class $[q_{i}]_{\cong}$ to both $L$ and $U$ . Then we set the sign of life for this class to the sign of life for $B$ extended by the considered transition [see Line 12]. Finally, we select each state $q$ from $[q_{i}]_{\cong}$ and compute a pushing weight by multiplying the weight of the currently considered transition with $q_{i}$ replaced by $q$ to the already computed pushing weight for the target state reached by the modified transition [see Line 13].

Theorem 2.

Algorithm 1 is correct and runs in time $\mathcal{O}\bigl{(}\lvert M\rvert+\lvert Q\rvert\bigr{)}$ .

Proof 3.2.

Since our algorithm is similar to the one of [Mal09], our proof closely resembles the proofs of [Mal09, Lemma 10 and Theorem 11] adjusted to equivalence classes. We already argued that the initialization runs in time $\mathcal{O}\bigl{(}\lvert F\rvert\bigr{)}\subseteq\mathcal{O}\bigl{(}\lvert Q\rvert\bigr{)}$ . It is easy to see that $U\subseteq L$ at all times in the main loop [Line 6–13] of the algorithm. Consequently, each block can be added at most once to $U$ since it is added at the same time to $L$ and only blocks not in $L$ can be added to $U$ . This yields that the main loop executes at most $\lvert(Q/\mathord{\cong})\rvert\leq\lvert Q\rvert$ times. The inner loop [Line 9–13] can execute at most $\lvert M\rvert$ times since each transition is considered at most once in the middle loop and at most once for each source state of the transition. The statements in the inner loop all execute in constant time except for Line 13, which can be executed once for each state $q\in Q$ . Overall, we thus obtain the running time $\mathcal{O}\bigl{(}\lvert M\rvert+\lvert Q\rvert\bigr{)}$ .

Now let us prove the post-conditions. By Lemma 1 we know that signs of life are shared between elements in an equivalence class of $\cong$ . The remaining statements are proved by induction along the outer main loop. Initially, we set

[TABLE]

by Lines 3–4, which proves the post-condition because $\operatorname{sol}([q]_{\cong})={\scriptstyle\Box}$ . In the main loop, we set $\lambda(q)=\lambda(\mu^{(1)}(c[q]))\cdot\mu^{(2)}(c[q])$ in Line 13. The equivalence class of $q^{\prime}=\mu^{(1)}(c[q])$ has already been explored in a previous iteration because $q\cong q_{i}$ , which by the congruence property yields $\mu^{(1)}(c[q])\cong\mu^{(1)}(c[q_{i}])$ and the latter was in the explored equivalence class $B$ , which in turn yields that the former is in $B$ . Consequently, we can employ the induction hypothesis and obtain $\lambda(q^{\prime})=h_{\mu}^{(2)}(\operatorname{sol}(B)[q^{\prime}])$ . In addition,

[TABLE]

which proves the post-condition because $\operatorname{sol}([q]_{\cong})=\operatorname{sol}([q_{i}]_{\cong})=\operatorname{sol}(B)[c]$ by Line 12. Clearly, $\operatorname{sol}([q]_{\cong})$ is a sign of life for $q$ , which proves that $q$ is live. Finally, suppose that there is a live state $q\in Q$ such that $[q]_{\cong}\notin L$ (i.e., we assume a live state that is not classified as such by Algorithm 1). Since it is live, it has a sign of life $c\in C_{\Sigma}(Q)$ . By induction on $c$ we can prove that, when processing $c[q]$ , there exists a transition that uses a source state $q_{i}$ such that $[q_{i}]_{\cong}\notin L$ , whereas the target state $q^{\prime}$ is such that $[q^{\prime}]_{\cong}\in L$ .666Such a switch must exist because all the final states are represented in $L$ . However, since $[q^{\prime}]$ was explored, the considered transition was considered in the algorithm, which means that the equivalence class $[q_{i}]_{\cong}$ was added to $L$ . This contradicts the assumption, which shows that all states that are not represented in $L$ are indeed dead.

{exa}

Our example dwta $N=(Q,\Sigma,\mu,F)$ is depicted left in Figure 1. For any transition (small circle, the annotation specifies the input symbol and the weight separated by a colon), the arrow leads to the target state and the source states $q_{1},\dotsc,q_{k}$ have been arranged in a counter-clockwise fashion starting from the target arrow. For example, the bottom center transition labeled $\sigma:4$ in the left dwta of Figure 1 corresponds to $\mu(\sigma(q_{b},q_{1})\to q_{2})=4$ ; i.e., its target state is $q_{2}$ , its symbol is $\sigma$ , its source states are $q_{b}q_{1}$ (in this order), and its weight is $4$ . As usual, final states are doubly circled. The graphical representation of wta is explained in detail in [Bor05]. The coarsest congruence $\cong$ respecting $F=\{q_{1},q_{f}\}$ is represented by the set $\{\{q_{1},q_{f}\},\,\{q_{2},q_{b}\}\}$ of equivalence classes (i.e., partition). We use this congruence in Algorithm 1. First, the block $F$ of final states is marked as live and added to $U$ . It is assigned the trivial context ${\scriptstyle\Box}$ as sign of life and each final state is assigned the trivial weight $1$ . Clearly, we can only select one equivalence class $B=F$ in the main loop. Let us consider the transition $\gamma(q_{b})\to q_{f}$ , whose target state $q_{f}$ is in $B$ . Since $[q_{b}]_{\cong}=\{q_{2},q_{b}\}$ has not yet been marked as live, we add it to both $L$ and $U$ . In addition, we set its sign of life to $\gamma({\scriptstyle\Box})$ . Finally, we set the pushing weights to $\lambda(q_{b})=\lambda(q_{f})\cdot\mu^{(2)}(\gamma(q_{b})\to q_{f})=8$ and $\lambda(q_{2})=\lambda(q_{f})\cdot\mu^{(2)}(\gamma(q_{2})\to q_{f})=2$ . Now all states are live, so the loops will terminate. Consequently, we have computed all signs of life and the pushing weights

[TABLE]

4. Pushing

The Myhill-Nerode congruence requires that there is a unique scaling factor for every pair $(q,q^{\prime})$ of equivalent states. Thus, any fixed sign of life $c$ for both $q$ and $q^{\prime}$ [for which $\chi_{F}(h_{\mu}^{(1)}(c[q]))=1=\chi_{F}(h_{\mu}^{(1)}(c[q^{\prime}]))$ ] yields non-zero weights $h_{\mu}^{(2)}(c[q])$ and $h_{\mu}^{(2)}(c[q^{\prime}])$ , which can be used to determine this unique scaling factor between $q$ and $q^{\prime}$ . In fact, we already computed those weights $\lambda(q)$ and $\lambda(q^{\prime})$ in Algorithm 1. By Lemma 1, states that are not weakly equivalent (and thus might not have the same sign of life after executing Algorithm 1 with $\sim_{M}$ ) also cannot be equivalent. For the remaining pairs of live states, we computed a sign of life $\operatorname{sol}([q]_{\sim_{M}})$ for the equivalence class $[q]_{\sim_{M}}$ of $q$ in the previous section. In addition, we computed pushing weights $\lambda(q)$ and $\lambda(q^{\prime})$ . Now, we will use these weights to normalize the wta by pushing [Moh97, Eis03, PG09]. Intuitively, pushing cancels the scaling factor for equivalent states, which we will prove in the next section. In general, it just redistributes weights along the transitions. In weighted (finite-state) string automata [Sak09], pushing is performed from the final states towards the initial states [Moh97]. Since we work with bottom-up wta [Bor05] (i.e., our notion of determinism is bottom-up), this works analogously here by moving weights from the root towards the leaves. However, we introduce our notion of pushing for arbitrary, not necessarily deterministic wta. To this end, we lift the corresponding definition [Moh97, page 296] from string to tree automata.

In this section, let $M=(Q,\Sigma,\mu,F)$ be an arbitrary wta and $\lambda\colon Q\to S\setminus\{0\}$ be an arbitrary mapping such that $\lambda(q)=1$ for every $q\in F$ .

{defi}

The pushed wta $\operatorname{push}_{\lambda}(M)$ is $(Q,\Sigma,\mu^{\prime},F)$ such that for every $\sigma\in\Sigma_{k}$ and $q,q_{1},\dotsc,q_{k}\in Q$

[TABLE]

The mapping $\lambda$ indicates the pushed weights. It is non-zero everywhere and has to be $1$ for final states because our model does not have final weights.777As already mentioned, the restriction to final states is a convenience and not an essential restriction. In the pushed wta $\operatorname{push}_{\lambda}(M)$ , the weight of every transition leading to the state $q\in Q$ is obtained from the weight of the corresponding transition in $M$ by multiplying the weight $\lambda(q)$ . To compensate, the weight of every transition leaving the state $q$ will cancel the weight $\lambda(q)$ by multiplying with $\lambda(q)^{-1}$ . Thus, we expect an equivalent wta after pushing, which we confirm by showing that $M$ and $\operatorname{push}_{\lambda}(M)$ are indeed equivalent. The corresponding statement for string automata is [Moh97, Lemma 4].

Proposition 3.

The wta $M$ and $\operatorname{push}_{\lambda}(M)$ are equivalent. Moreover, if $M$ is deterministic, then so is $\operatorname{push}_{\lambda}(M)$ .

Proof 4.1.

Let $\operatorname{push}_{\lambda}(M)=M^{\prime}=(Q,\Sigma,\mu^{\prime},F)$ . The preservation of determinism is obvious because $\operatorname{supp}(\mu^{\prime})\subseteq\operatorname{supp}(\mu)$ .888In fact, $\operatorname{supp}(\mu^{\prime})=\operatorname{supp}(\mu)$ because semifields are zero-divisor free [Bor03, Lemma 1]. We prove that $h_{\mu^{\prime}}(t\to q)=\lambda(q)\cdot h_{\mu}(t\to q)$ for every $t\in T_{\Sigma}$ and $q\in Q$ by induction on $t$ . Let $t=\sigma(t_{1},\dotsc,t_{k})$ for some $\sigma\in\Sigma_{k}$ and $t_{1},\dotsc,t_{k}\in T_{\Sigma}$ . By the induction hypothesis, we have $h_{\mu^{\prime}}(t_{i}\to q_{i})=\lambda(q_{i})\cdot h_{\mu}(t_{i}\to q_{i})$ for every $i\in[1,k]$ and $q_{i}\in Q$ . Consequently,

[TABLE]

We complete the proof as follows.

[TABLE]

because $\lambda(q)=1$ for every $q\in F$ .

Theorem 4.

The wta $\operatorname{push}_{\lambda}(M)$ is equivalent to $M$ and can be obtained in time $\mathcal{O}\bigl{(}\lvert M\rvert\bigr{)}$ .

{exa}

Let us return to our example dwta $N$ left in Figure 1 and perform pushing. The pushing weights $\lambda$ are given in Example 3. We consider the transition $\sigma(q_{b},q_{f})\to q_{2}$ , which has weight $4$ in $N$ . In $\operatorname{push}_{\lambda}(N)$ this transition has the weight

[TABLE]

The dwta $\operatorname{push}_{\lambda}(N)$ is presented right in Figure 1. With a little effort, we can confirm that $q_{2}$ and $q_{b}$ are equivalent in $\operatorname{push}_{\lambda}(N)$ , whereas $q_{1}$ and $q_{f}$ are not.

5. Minimization

Our main application of weight pushing is efficient dwta minimization, which we present next. The overall structure of our minimization procedure is presented in Algorithm 2. As mentioned earlier, the coarsest congruence $\sim_{M}$ for a dwta $M=(Q,\Sigma,\mu,F)$ that respects $F$ can be obtained by minimization [HMM09] of $\operatorname{unw}(M)$ . We call this procedure ComputeCoarsestCongruence and supply it with a dwta $M$ and an equivalence relation. It returns the coarsest congruence (of $M$ ) that refines the given equivalence relation.

Let $M=(Q,\Sigma,\mu,F)$ be a dwta (without useless states) and $\lambda\colon Q\to S\setminus\{0\}$ be the pushing weights computed by Algorithm 1 when run on $M$ and $\sim_{M}$ .999In a dwta without useless states we have $\lvert Q\rvert\leq\lvert M\rvert$ . In addition, we let $\operatorname{push}_{\lambda}(M)=M^{\prime}=(Q,\Sigma,\mu^{\prime},F)$ .

The dwta $M^{\prime}$ has the property that $(\mu^{\prime})^{(2)}(\sigma(q_{1},\dotsc,q_{k}))=(\mu^{\prime})^{(2)}(\sigma(q^{\prime}_{1},\dotsc,q^{\prime}_{k}))$ for all $\sigma\in\Sigma_{k}$ and states $q_{i}\equiv_{M}q^{\prime}_{i}$ for every $i\in[1,k]$ . We will prove this property (1) in Lemma 5. It is this property, which, in analogy to the string case [Moh97, Eis03], allows us to compute the equivalence $\mathord{\equiv_{M}}=\mathord{\sim_{N}}$ on an unweighted fta $N$ , in which we treat the transition weight as part of the input symbol. For example, the algorithm of [HMM09] can then be used to compute $\mathord{\sim_{N}}$ . Finally, we merge the equivalent states using the information about the scaling factors contained in the pushing weights $\lambda$ in the same way as in [Mal09]. Let us start with the formal definitions.101010We avoid a change of the weight structure from our semifield to the Boolean semifield $\mathbb{B}$ since the multiplicative submonoid induced by $\{0,1\}$ is isomorphic to the multiplicative monoid of $\mathbb{B}$ . Thus, our dwta with weights in $\{0,1\}$ compute in the same manner as a dwta over $\mathbb{B}$ or equivalently a deterministic fta.

{defi}

Let $M=(Q,\Sigma,\mu,F)$ be a dwta, and let $S^{\prime}=\{\mu(\tau)\mid\tau\in\operatorname{supp}(\mu)\}$ be the finite set of non-zero weights that occur as transition weights in $M$ . The alphabetic dwta $\operatorname{alph}(M)$ for $M$ is $(Q,\Sigma\times S^{\prime},\mu^{\prime\prime},F)$ , where

•

$\operatorname{rk}(\langle\sigma,s\rangle)=\operatorname{rk}(\sigma)$ for every $\sigma\in\Sigma$ and $s\in S^{\prime}$ ,

•

$\mu^{\prime\prime}(\tau)=1$ for every $\tau\in\operatorname{supp}(\mu^{\prime\prime})$ , and

•

for every $\sigma\in\Sigma_{k}$ , $s\in S^{\prime}$ , and $q,q_{1},\dotsc,q_{k}\in Q$

[TABLE]

Clearly, the construction of $\operatorname{alph}(M)$ can be performed in time $\mathcal{O}\bigl{(}\lvert M\rvert\bigr{)}$ . Next, we show that the equivalence $\equiv_{M}$ in $M$ coincides with the equivalence $\sim_{\operatorname{alph}(M^{\prime})}$ in $\operatorname{alph}(M^{\prime})$ , where $M^{\prime}=\operatorname{push}_{\lambda}(M)$ . We achieve this proof by showing both inclusions.

Lemma 5.

The congruence $\equiv_{M}$ of $M$ is a congruence of $\operatorname{alph}(M^{\prime})$ that respects $F$ .

Proof 5.1.

Let $\operatorname{alph}(M^{\prime})=(Q,\Sigma\times S^{\prime},\mu^{\prime\prime},F)$ . Since $M$ and $\operatorname{alph}(M^{\prime})$ have the same final states $F$ , $\equiv_{M}$ trivially respects $F$ because it is a congruence of $M$ that respects $F$ . Naturally, $\equiv_{M}$ is an equivalence, so it remains to prove the congruence property for $\operatorname{alph}(M^{\prime})$ . Let $\sigma\in\Sigma_{k}$ and $q_{i}\equiv_{M}q^{\prime}_{i}$ for every $i\in[1,k]$ . Then

[TABLE]

because $\equiv_{M}$ is a congruence of $M$ . For the moment, let us assume that111111Mind that we compare the weights in $M^{\prime}=\operatorname{push}_{\lambda}(M)$ here.

[TABLE]

then

[TABLE]

For all the remaining combinations of $\langle\sigma,s^{\prime}\rangle$ we have that both $(\mu^{\prime\prime})^{(1)}(\langle\sigma,s^{\prime}\rangle(q_{1},\dotsc,q_{k}))$ and $(\mu^{\prime\prime})^{(1)}(\langle\sigma,s^{\prime}\rangle(q^{\prime}_{1},\dotsc,q^{\prime}_{k}))$ are undefined and thus equal. We have thus proved the congruence property given the assumption. Consequently, it remains to show that the assumption

[TABLE]

is true. By Definition 4, we have

[TABLE]

Now we prove that

[TABLE]

for every $j\in[1,k]$ , where $c_{j}=\sigma(q^{\prime}_{1},\dotsc,q^{\prime}_{j-1},{\scriptstyle\Box},q_{j+1},\dotsc,q_{k})$ . Let $p_{j}=\mu^{(1)}(c_{j}[q_{j}])$ and $p^{\prime}_{j}=\mu^{(1)}(c_{j}[q^{\prime}_{j}])$ . Since $q_{j}\equiv_{M}q^{\prime}_{j}$ , we also have that $p_{j}\equiv_{M}p^{\prime}_{j}$ because $\equiv_{M}$ is a congruence of $M$ . This yields that $p_{j}\sim_{M}p^{\prime}_{j}$ by Lemma 1. Let $c=\operatorname{sol}([p_{j}]_{\sim_{M}})$ be a sign of life for both $p_{j}$ and $p^{\prime}_{j}$ . Moreover, we have a constant scaling factor between the equivalent states $q_{j}$ and $q^{\prime}_{j}$ , which yields

[TABLE]

where $(\dagger)$ holds because $c[c_{j}]$ is a sign of life for both $q_{j}$ and $q^{\prime}_{j}$ and $(\ddagger)$ holds essentially by definition. With these equations, let us inspect the main equality.

[TABLE]

Now we are ready to return to the proof obligation expressed in (1). We apply (4) in total $k$ times to obtain the desired statement.

[TABLE]

which completes the proof.

Theorem 6.

We have $\mathord{\equiv_{M}}=\mathord{\sim_{N}}$ , where $N=\operatorname{alph}(M^{\prime})$ .

Proof 5.2.

Lemma 5 shows that $\equiv_{M}$ is a congruence of $N$ that respects $F$ . Since $\sim_{N}$ is the coarsest congruence of $N$ that respects $F$ by [GS84, Theorem II.6.10], we obtain that $\mathord{\equiv_{M}}\subseteq\mathord{\sim_{N}}$ . The converse is simple to prove as states that are weakly equivalent in $\operatorname{alph}(M^{\prime})$ share exactly the same signs of life with the scaling factor $1$ . Since the signs of life already indicate the transition weights, we immediately obtain that such weakly equivalent states in $\operatorname{alph}(M^{\prime})$ have corresponding transitions with equal transition weights in $M^{\prime}$ , which proves that those states are also equivalent in $M^{\prime}$ with the scaling factor $1$ . The latter statement can then be used to prove that they are also equivalent in $M$ (with a scaling factor that is potentially different from $1$ ).

The currently fastest dwta minimization algorithm [Mal09] runs in time $\mathcal{O}\bigl{(}\lvert M\rvert\cdot\lvert Q\rvert\bigr{)}$ . Our approach, which relies on pushing and is presented in Algorithm 2, achieves the same run-time $\mathcal{O}\bigl{(}\lvert M\rvert\log{\lvert Q\rvert}\bigr{)}$ as the fastest minimization algorithms for deterministic fta.

Corollary 7 (see Algorithm 2).

For every dwta $M=(Q,\Sigma,\mu,F)$ , we can compute an equivalent minimal dwta in time $\mathcal{O}\bigl{(}\lvert M\rvert\log{\lvert Q\rvert}\bigr{)}$ .

6. Testing equivalence

In this final section, we want to decide whether two given dwta are equivalent. To this end, let $M=(Q,\Sigma,\mu,F)$ and $M^{\prime}=(Q^{\prime},\Sigma,\mu^{\prime},F^{\prime})$ be dwta. The overall approach is presented in Alg. 3. First, we compute a correspondence $g\colon Q\to Q^{\prime}$ between states. For every $q\in Q$ , we compute a tree $t\in T_{\Sigma}$ , which is also called access tree for $q$ , such that $h_{\mu}^{(1)}(t)=q$ . If no access tree exists, then $q$ is not reachable and can be deleted. A dwta, in which all states are reachable, is called accessible. To avoid these details, let us assume that $M$ and $M^{\prime}$ are accessible, which can always be achieved in time $\mathcal{O}\bigl{(}\lvert M\rvert+\lvert M^{\prime}\rvert\bigr{)}$ . In this case, we can compute an access tree $a(q)\in T_{\Sigma}$ for every state $q\in Q$ in time $\mathcal{O}\bigl{(}\lvert M\rvert\bigr{)}$ using standard breadth-first search, in which we unfold each state (i.e., explore all transitions leading to it) at most once. To keep the representation efficient, we store the access trees in the format $\Sigma(Q)$ , where each state $q\in Q$ refers to its access tree $a(q)$ . To obtain the corresponding state $g(q)$ , we compute the state of $Q^{\prime}$ that is reached when processing the access tree $a(q)$ . Formally, $g(q)=h_{\mu^{\prime}}^{(1)}(a(q))$ for every $q\in Q$ . This computation can also be achieved in time $\mathcal{O}\bigl{(}\lvert M\rvert\bigr{)}$ since we can reuse the results for the subtrees. Consequently, we have that $h_{\mu}^{(1)}(a(q))=q$ and $h_{\mu^{\prime}}^{(1)}(a(q))=g(q)$ for every $q\in Q$ . Clearly, the computation of the access trees $a\colon Q\to T_{\Sigma}$ and the correspondence $g\colon Q\to Q^{\prime}$ can be performed in time $\mathcal{O}\bigl{(}\lvert M\rvert\bigr{)}$ . Next, we compute the coarsest congruences $\sim_{M}$ and $\sim_{M^{\prime}}$ for $M$ and $M^{\prime}$ that respect $F$ and $F^{\prime}$ , respectively, and the signs of life for $M$ .

Lemma 8.

Let $M$ and $M^{\prime}$ be equivalent. The correspondence $g\colon Q\to Q^{\prime}$ is compatible with the congruences $\sim_{M}$ and $\sim_{M^{\prime}}$ ; i.e., $g(q)\sim_{M^{\prime}}g(p)$ if and only if $q\sim_{M}p$ for all $q,p\in Q$ . Moreover, for every reachable $q^{\prime}\in Q^{\prime}$ there exists $q\in Q$ such that $g(q)\in[q^{\prime}]_{\sim_{M^{\prime}}}$ . Consequently, $g$ induces a bijection $\overline{g}\colon(Q/\mathord{\sim_{M}})\to(Q^{\prime}/\mathord{\sim_{M^{\prime}}})$ on the equivalence classes.

Proof 6.1.

Let $q,p\in Q$ , and let $t=a(q)$ and $u=a(p)$ be the corresponding access trees. Then

[TABLE]

where $(\star)$ follows from [Mal08, Lemma 4] and $(\dagger)$ follows from the easy fact that $h_{\mu}^{(1)}(c[q])\in F$ if and only if $c[t]\in\operatorname{supp}(M)$ for all $q\in Q$ and $t\in T_{\Sigma}$ such that $h_{\mu}^{(1)}(t)=q$ .

For the second statement, let $q^{\prime}\in Q^{\prime}$ be a reachable state, and let $t\in T_{\Sigma}$ be such that $h_{\mu^{\prime}}^{(1)}(t)=q^{\prime}$ . Clearly, we have

[TABLE]

where $q=h_{\mu}^{(1)}(t)$ . Consequently, using $(\star)$ we obtain $q^{\prime}\sim_{M^{\prime}}g(q)$ .

We just demonstrated that for equivalent dwta the correspondence $g$ always yields a bijection $\overline{g}\colon(Q/\mathord{\sim_{M}})\to(Q^{\prime}/\mathord{\sim_{M^{\prime}}})$ . We can test the compatibility in time $\mathcal{O}\bigl{(}\lvert Q\rvert+\lvert Q^{\prime}\rvert\bigr{)}$ . Next we transfer the signs of life via $\overline{g}$ to the equivalence classes of $\sim_{M^{\prime}}$ and calculate the corresponding pushing weights for all states $q^{\prime}\in Q^{\prime}$ . Since the signs of life can contain states of $Q$ , we need to rename them using the correspondence $g$ , so we use the function $\operatorname{ren}_{g}\colon T_{\Sigma}(Q\cup\{{\scriptstyle\Box}\})\to T_{\Sigma}(Q^{\prime}\cup\{{\scriptstyle\Box}\})$ , which is defined by $\operatorname{ren}_{g}({\scriptstyle\Box})={\scriptstyle\Box}$ , $\operatorname{ren}_{g}(q)=g(q)$ for all $q\in Q$ , and $\operatorname{ren}_{g}(\sigma(t_{1},\dotsc,t_{k}))=\sigma(\operatorname{ren}_{g}(t_{1}),\dotsc,\operatorname{ren}_{g}(t_{k}))$ for all $\sigma\in\Sigma_{k}$ and trees $t_{1},\dotsc,t_{k}\in T_{\Sigma}(Q\cup\{{\scriptstyle\Box}\})$ . We note that $\operatorname{ren}_{g}(c)\in C_{\Sigma}(Q^{\prime})$ for all $c\in C_{\Sigma}(Q)$ .

Using this approach corresponding equivalence classes receive the same sign of life (modulo the renaming $\operatorname{ren}_{g}$ of the states). We then minimize $M$ and $M^{\prime}$ using the method of Section 5 (i.e., we perform pushing followed by unweighted minimization). Finally, we test the obtained deterministic fta for isomorphism.

Lemma 9.

We use the symbols of Algorithm 3. Given a compatible correspondence $g$ , the dwta $M$ and $M^{\prime}$ are equivalent if and only if the deterministic unweighted fta $\operatorname{alph}(\operatorname{push}_{\lambda}(M))$ and $\operatorname{alph}(\operatorname{push}_{\lambda^{\prime}}(M^{\prime}))$ are equivalent.

Proof 6.2.

Clearly, if the deterministic fta $\operatorname{alph}(\operatorname{push}_{\lambda}(M))$ and $\operatorname{alph}(\operatorname{push}_{\lambda^{\prime}}(M^{\prime}))$ are equivalent, then also $\operatorname{push}_{\lambda}(M)$ and $\operatorname{push}_{\lambda^{\prime}}(M^{\prime})$ are equivalent since the weights are annotated on the symbols of the former devices. Moreover, since pushing preserves the semantics (see Proposition 3), also the dwta $M$ and $M^{\prime}$ are equivalent, which concludes one direction. For the other direction, let $M$ and $M^{\prime}$ be equivalent. Then also $\operatorname{push}_{\lambda}(M)$ and $\operatorname{push}_{\lambda^{\prime}}(M^{\prime})$ are equivalent due to Proposition 3. An easy adaptation of the proof (of the equality (1) of the transition weights) of Lemma 5 can be used to show that the transition weights of corresponding transitions are equal and hence $\operatorname{alph}(\operatorname{push}_{\lambda}(M))$ and $\operatorname{alph}(\operatorname{push}_{\lambda^{\prime}}(M^{\prime}))$ are equivalent.

Lemma 9 proves the correctness of Algorithm 3 because the minimal deterministic fta for a given tree language is unique (up to isomorphism) [GS84, Theorem 2.11.12]. The run-time of our algorithm should be compared to the previously (asymptotically) fastest equivalence test for dwta of [DHM11], which runs in time $\mathcal{O}\bigl{(}\lvert M\rvert\cdot\lvert M^{\prime}\rvert\bigr{)}$ .

Theorem 10.

We can test equivalence of $M$ and $M^{\prime}$ in time $\mathcal{O}\bigl{(}(\lvert M\rvert+\lvert M^{\prime}\rvert)\log{(\lvert Q\rvert+\lvert Q^{\prime}\rvert)}\bigr{)}$ .

Acknowledgments

The authors gratefully acknowledge the insight and suggestions provided by the reviewers of the conference and the current version.

Bibliography36

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[Ang 87] Dana Angluin. Learning regular sets from queries and counterexamples. Information and Computation , 75(2):87–106, 1987.
2[BLB 83] Symeon Bozapalidis and Olympia Louscou-Bozapalidou. The rank of a formal tree power series. Theoretical Computer Science , 27(1–2):211–215, 1983.
3[BMV 10] Matthias Büchse, Jonathan May, and Heiko Vogler. Determinization of weighted tree automata using factorizations. Journal of Automata, Languages and Combinatorics , 15(3–4):229–254, 2010.
4[Bor 03] Björn Borchardt. The Myhill-Nerode theorem for recognizable tree series. In Proc. 7th Int. Conf. Developments in Language Theory , volume 2710 of Lecture Notes in Computer Science , pages 146–158. Springer, 2003.
5[Bor 05] Björn Borchardt. The Theory of Recognizable Tree Series . Ph D thesis, TU Dresden, 2005.
6[Boz 99] Symeon Bozapalidis. Equational elements in additive algebras. Theory of Computing Systems , 32(1):1–33, 1999.
7[BR 82] Jean Berstel and Christophe Reutenauer. Recognizable formal power series on trees. Theoretical Computer Science , 18(2):115–148, 1982.
8[BV 03] Björn Borchardt and Heiko Vogler. Determinization of finite state weighted tree automata. Journal of Automata, Languages and Combinatorics , 8(3):417–463, 2003.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Pushing for weighted tree automata\rsuper*

Abstract.

Key words and phrases:

1. Introduction

2. Preliminaries

3. Signs of life

Lemma 1** (see [Mal09, Lemma 9]).**

Proof 3.1**.**

Theorem 2**.**

Proof 3.2**.**

4. Pushing

Proposition 3**.**

Proof 4.1**.**

Theorem 4**.**

5. Minimization

Lemma 5**.**

Proof 5.1**.**

Theorem 6**.**

Proof 5.2**.**

Corollary 7** (see Algorithm 2).**

6. Testing equivalence

Lemma 8**.**

Proof 6.1**.**

Lemma 9**.**

Proof 6.2**.**

Theorem 10**.**

Acknowledgments

Lemma 1 (see [Mal09, Lemma 9]).

Proof 3.1.

Theorem 2.

Proof 3.2.

Proposition 3.

Proof 4.1.

Theorem 4.

Lemma 5.

Proof 5.1.

Theorem 6.

Proof 5.2.

Corollary 7 (see Algorithm 2).

Lemma 8.

Proof 6.1.

Lemma 9.

Proof 6.2.

Theorem 10.