Stochastic Enumeration with Importance Sampling

Alathea Jensen

arXiv:1902.01698·cs.DS·February 6, 2019

Stochastic Enumeration with Importance Sampling

Alathea Jensen

PDF

TL;DR

This paper enhances the Stochastic Enumeration algorithm by integrating importance sampling, leading to significant variance reduction in estimating sums over decision trees, with theoretical analysis and empirical validation.

Contribution

It introduces an importance sampling extension to SE and provides theoretical and empirical evidence of its effectiveness in variance reduction.

Findings

01

Importance sampling reduces variance significantly

02

The algorithm is effective for counting linear extensions of posets

03

Theoretical analysis supports variance reduction claims

Abstract

Many hard problems in the computational sciences are equivalent to counting the leaves of a decision tree, or, more generally, summing a cost function over the nodes. These problems include calculating the permanent of a matrix, finding the volume of a convex polyhedron, and counting the number of linear extensions of a partially ordered set. Many approximation algorithms exist to estimate such sums. One of the most recent is Stochastic Enumeration (SE), introduced in 2013 by Rubinstein. In 2015, Vaisman and Kroese provided a rigorous analysis of the variance of SE, and showed that SE can be extended to a fully polynomial randomized approximation scheme for certain cost functions on random trees. We present an algorithm that incorporates an importance function into SE, and provide theoretical analysis of its efficacy. We also present the results of numerical experiments to measure the…

Equations180

Cost (T) = v \in V \sum c (v)

Cost (T) = v \in V \sum c (v)

Cost (T) = c_{0} + c_{1} D_{0} + c_{2} D_{0} D_{1} + \dots + c_{n} D_{0} D_{1} \dots D_{n - 1}

Cost (T) = c_{0} + c_{1} D_{0} + c_{2} D_{0} D_{1} + \dots + c_{n} D_{0} D_{1} \dots D_{n - 1}

c (v) = v \in v \sum c (v)

c (v) = v \in v \sum c (v)

S (v) = v \in v ⋃ S (v)

S (v) = v \in v ⋃ S (v)

H(\mathbf{v})=\big{\{}\mathbf{w}\subseteq S(\mathbf{v}):|\mathbf{w}|=\mathrm{min}(B,|S(\mathbf{v})|)\big{\}}

H(\mathbf{v})=\big{\{}\mathbf{w}\subseteq S(\mathbf{v}):|\mathbf{w}|=\mathrm{min}(B,|S(\mathbf{v})|)\big{\}}

T_{v} = v \in v ⋃ T_{v}

T_{v} = v \in v ⋃ T_{v}

Cost (T_{v}) = v \in v \sum Cost (T_{v})

Cost (T_{v}) = v \in v \sum Cost (T_{v})

C_{SEP} \leftarrow C_{SEP} + D

C_{SEP} \leftarrow C_{SEP} + D

D \leftarrow \frac{∣ S ( x _{k} ) ∣}{∣ x _{k} ∣} D

D \leftarrow \frac{∣ S ( x _{k} ) ∣}{∣ x _{k} ∣} D

D \leftarrow \frac{∣ S ( x _{0} ) ∣}{∣ x _{0} ∣} D = 2

D \leftarrow \frac{∣ S ( x _{0} ) ∣}{∣ x _{0} ∣} D = 2

C_{SEP} \leftarrow C_{SEP} + D = 3

C_{SEP} \leftarrow C_{SEP} + D = 3

D \leftarrow \frac{∣ S ( x _{1} ) ∣}{∣ x _{1} ∣} D = 3

D \leftarrow \frac{∣ S ( x _{1} ) ∣}{∣ x _{1} ∣} D = 3

C_{SEP} \leftarrow C_{SEP} + D = 6

C_{SEP} \leftarrow C_{SEP} + D = 6

D \leftarrow \frac{∣ S ( x _{2} ) ∣}{∣ x _{2} ∣} D = 4.5

D \leftarrow \frac{∣ S ( x _{2} ) ∣}{∣ x _{2} ∣} D = 4.5

C_{SEP} \leftarrow C_{SEP} + D = 10.5

C_{SEP} \leftarrow C_{SEP} + D = 10.5

D \leftarrow \frac{∣ S ( x _{3} ) ∣}{∣ x _{3} ∣} D = 2.25

D \leftarrow \frac{∣ S ( x _{3} ) ∣}{∣ x _{3} ∣} D = 2.25

C_{SEP} \leftarrow C_{SEP} + D = 12.75

C_{SEP} \leftarrow C_{SEP} + D = 12.75

C_{SEP} (T_{x_{0}}) = \frac{c ( x _{0} )}{∣ x _{0} ∣} + D_{0} \frac{c ( x _{1} )}{∣ x _{1} ∣} + D_{0} D_{1} \frac{c ( x _{2} )}{∣ x _{2} ∣} + \dots + D_{0} D_{1} \dots D_{τ - 1} \frac{c ( x _{τ} )}{∣ x _{τ} ∣} = \frac{c ( x _{0} )}{∣ x _{0} ∣} + D_{0} (\frac{c ( x _{1} )}{∣ x _{1} ∣} + D_{1} \frac{c ( x _{2} )}{∣ x _{2} ∣} + \dots + D_{2} \dots D_{τ - 1} \frac{c ( x _{τ} )}{∣ x _{τ} ∣})

C_{SEP} (T_{x_{0}}) = \frac{c ( x _{0} )}{∣ x _{0} ∣} + D_{0} \frac{c ( x _{1} )}{∣ x _{1} ∣} + D_{0} D_{1} \frac{c ( x _{2} )}{∣ x _{2} ∣} + \dots + D_{0} D_{1} \dots D_{τ - 1} \frac{c ( x _{τ} )}{∣ x _{τ} ∣} = \frac{c ( x _{0} )}{∣ x _{0} ∣} + D_{0} (\frac{c ( x _{1} )}{∣ x _{1} ∣} + D_{1} \frac{c ( x _{2} )}{∣ x _{2} ∣} + \dots + D_{2} \dots D_{τ - 1} \frac{c ( x _{τ} )}{∣ x _{τ} ∣})

C_{SEP} (T_{x_{0}}) = \frac{c ( x _{0} )}{∣ x _{0} ∣} + D_{0} C_{SEP} (T_{x_{1}})

C_{SEP} (T_{x_{0}}) = \frac{c ( x _{0} )}{∣ x _{0} ∣} + D_{0} C_{SEP} (T_{x_{1}})

C_{SEP} (T_{v}) = \frac{c ( v )}{∣ v ∣} + D_{0} C_{SEP} (T_{w}) = \frac{c ( v )}{∣ v ∣} + \frac{∣ w ∣ C _{SEP} ( T _{w} )}{∣ v ∣ ( ∣ w ∣ - 1 ∣ S ( v ) ∣ - 1 ) P ( w )}

C_{SEP} (T_{v}) = \frac{c ( v )}{∣ v ∣} + D_{0} C_{SEP} (T_{w}) = \frac{c ( v )}{∣ v ∣} + \frac{∣ w ∣ C _{SEP} ( T _{w} )}{∣ v ∣ ( ∣ w ∣ - 1 ∣ S ( v ) ∣ - 1 ) P ( w )}

\mathrm{Cost}\big{(}T_{S(\mathbf{v})}\big{)}=\sum_{\mathbf{w}\in H(\mathbf{v})}\frac{\mathrm{Cost}(T_{\mathbf{w}})}{\binom{|S(\mathbf{v})|-1}{|\mathbf{w}|-1}}

\mathrm{Cost}\big{(}T_{S(\mathbf{v})}\big{)}=\sum_{\mathbf{w}\in H(\mathbf{v})}\frac{\mathrm{Cost}(T_{\mathbf{w}})}{\binom{|S(\mathbf{v})|-1}{|\mathbf{w}|-1}}

w \in H (v) \sum \frac{Cost ( T _{w} )}{( ∣ w ∣ - 1 ∣ S ( v ) ∣ - 1 )} = w \in H (v) \sum \frac{1}{( ∣ w ∣ - 1 ∣ S ( v ) ∣ - 1 )} w \in w \sum Cost (T_{w})

w \in H (v) \sum \frac{Cost ( T _{w} )}{( ∣ w ∣ - 1 ∣ S ( v ) ∣ - 1 )} = w \in H (v) \sum \frac{1}{( ∣ w ∣ - 1 ∣ S ( v ) ∣ - 1 )} w \in w \sum Cost (T_{w})

w \in H (v) \sum \frac{Cost ( T _{w} )}{( ∣ w ∣ - 1 ∣ S ( v ) ∣ - 1 )} = \frac{1}{( ∣ w ∣ - 1 ∣ S ( v ) ∣ - 1 )} w \in H (v) \sum w \in w \sum Cost (T_{w})

w \in H (v) \sum \frac{Cost ( T _{w} )}{( ∣ w ∣ - 1 ∣ S ( v ) ∣ - 1 )} = \frac{1}{( ∣ w ∣ - 1 ∣ S ( v ) ∣ - 1 )} w \in H (v) \sum w \in w \sum Cost (T_{w})

w \in H (v) \sum \frac{Cost ( T _{w} )}{( ∣ w ∣ - 1 ∣ S ( v ) ∣ - 1 )} = \frac{1}{( ∣ w ∣ - 1 ∣ S ( v ) ∣ - 1 )} (∣ w ∣ - 1 ∣ S ( v ) ∣ - 1) w \in S (v) \sum Cost (T_{w}) = w \in S (v) \sum Cost (T_{w}) = Cost (T_{S (v)})

w \in H (v) \sum \frac{Cost ( T _{w} )}{( ∣ w ∣ - 1 ∣ S ( v ) ∣ - 1 )} = \frac{1}{( ∣ w ∣ - 1 ∣ S ( v ) ∣ - 1 )} (∣ w ∣ - 1 ∣ S ( v ) ∣ - 1) w \in S (v) \sum Cost (T_{w}) = w \in S (v) \sum Cost (T_{w}) = Cost (T_{S (v)})

\mathbb{E}\big{[}C_{\mathrm{SEP}}(T_{\mathbf{v}})\big{]}=\frac{\mathrm{Cost}(T_{\mathbf{v}})}{|\mathbf{v}|}

\mathbb{E}\big{[}C_{\mathrm{SEP}}(T_{\mathbf{v}})\big{]}=\frac{\mathrm{Cost}(T_{\mathbf{v}})}{|\mathbf{v}|}

\frac{c ( v )}{∣ v ∣} = \frac{Cost ( T _{v} )}{∣ v ∣}

\frac{c ( v )}{∣ v ∣} = \frac{Cost ( T _{v} )}{∣ v ∣}

E [C_{SEP} (T_{v})] = E [\frac{c ( v )}{∣ v ∣} + \frac{∣ w ∣ C _{SEP} ( T _{w} )}{∣ v ∣ ( ∣ w ∣ - 1 ∣ S ( v ) ∣ - 1 ) P ( w )}] = \frac{c ( v )}{∣ v ∣} + E [\frac{∣ w ∣ C _{SEP} ( T _{w} )}{∣ v ∣ ( ∣ w ∣ - 1 ∣ S ( v ) ∣ - 1 ) P ( w )}] = \frac{c ( v )}{∣ v ∣} + w \in H (v) \sum P (w) \frac{∣ w ∣ E [ C _{SEP} ( T _{w} )]}{∣ v ∣ ( ∣ w ∣ - 1 ∣ S ( v ) ∣ - 1 ) P ( w )} = \frac{c ( v )}{∣ v ∣} + w \in H (v) \sum \frac{∣ w ∣}{∣ v ∣ ( ∣ w ∣ - 1 ∣ S ( v ) ∣ - 1 )} E [C_{SEP} (T_{w})] = \frac{c ( v )}{∣ v ∣} + w \in H (v) \sum \frac{∣ w ∣}{∣ v ∣ ( ∣ w ∣ - 1 ∣ S ( v ) ∣ - 1 )} \frac{Cost ( T _{w} )}{∣ w ∣} = \frac{c ( v )}{∣ v ∣} + \frac{1}{∣ v ∣} w \in H (v) \sum \frac{Cost ( T _{w} )}{( ∣ w ∣ - 1 ∣ S ( v ) ∣ - 1 )}

E [C_{SEP} (T_{v})] = E [\frac{c ( v )}{∣ v ∣} + \frac{∣ w ∣ C _{SEP} ( T _{w} )}{∣ v ∣ ( ∣ w ∣ - 1 ∣ S ( v ) ∣ - 1 ) P ( w )}] = \frac{c ( v )}{∣ v ∣} + E [\frac{∣ w ∣ C _{SEP} ( T _{w} )}{∣ v ∣ ( ∣ w ∣ - 1 ∣ S ( v ) ∣ - 1 ) P ( w )}] = \frac{c ( v )}{∣ v ∣} + w \in H (v) \sum P (w) \frac{∣ w ∣ E [ C _{SEP} ( T _{w} )]}{∣ v ∣ ( ∣ w ∣ - 1 ∣ S ( v ) ∣ - 1 ) P ( w )} = \frac{c ( v )}{∣ v ∣} + w \in H (v) \sum \frac{∣ w ∣}{∣ v ∣ ( ∣ w ∣ - 1 ∣ S ( v ) ∣ - 1 )} E [C_{SEP} (T_{w})] = \frac{c ( v )}{∣ v ∣} + w \in H (v) \sum \frac{∣ w ∣}{∣ v ∣ ( ∣ w ∣ - 1 ∣ S ( v ) ∣ - 1 )} \frac{Cost ( T _{w} )}{∣ w ∣} = \frac{c ( v )}{∣ v ∣} + \frac{1}{∣ v ∣} w \in H (v) \sum \frac{Cost ( T _{w} )}{( ∣ w ∣ - 1 ∣ S ( v ) ∣ - 1 )}

E [C_{SEP} (T_{v})] = \frac{c ( v )}{∣ v ∣} + \frac{Cost ( T _{S (v)} )}{∣ v ∣} = \frac{Cost ( T _{v} )}{∣ v ∣}

E [C_{SEP} (T_{v})] = \frac{c ( v )}{∣ v ∣} + \frac{Cost ( T _{S (v)} )}{∣ v ∣} = \frac{Cost ( T _{v} )}{∣ v ∣}

P (w) = \frac{Cost ( T _{w} )}{x \in H ( v ) \sum Cost ( T _{x} )}

P (w) = \frac{Cost ( T _{w} )}{x \in H ( v ) \sum Cost ( T _{x} )}

C_{SEP} (T_{v}) = \frac{Cost ( T _{v} )}{∣ v ∣}

C_{SEP} (T_{v}) = \frac{Cost ( T _{v} )}{∣ v ∣}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Stochastic Enumeration with Importance Sampling

Alathea Jensen

(December 5, 2017)

Abstract

Many hard problems in the computational sciences are equivalent to counting the leaves of a decision tree, or, more generally, by summing a cost function over the nodes. These problems include calculating the permanent of a matrix, finding the volume of a convex polyhedron, and counting the number of linear extensions of a partially ordered set. Many approximation algorithms exist to estimate such sums. One of the most recent is Stochastic Enumeration (SE), introduced in 2013 by Rubinstein. In 2015, Vaisman and Kroese provided a rigorous analysis of the variance of SE, and showed that SE can be extended to a fully polynomial randomized approximation scheme for certain cost functions on random trees. We present an algorithm that incorporates an importance function into SE, and provide theoretical analysis of its efficacy. We also present the results of numerical experiments to measure the variance of an application of the algorithm to the problem of counting linear extensions of a poset, and show that introducing importance sampling results in a significant reduction of variance as compared to the original version of SE.

Acknowledgments

This is a pre-print of an article published in Methodology and Computing in Applied Probability. The final authenticated version is available online at:

https://doi.org/10.1007/s11009-018-9619-2

The author would like to thank Isabel Beichl and Francis Sullivan for the idea for this project. The author would also like to thank the Applied and Computational Mathematics Division of the Information Technology Laboratory at the National Institute of Standards and Technology for hosting the author as a guest researcher during the preparation of this article.

1 Introduction

Many hard problems in mathematics, computer science, and the physical sciences are equivalent to summing a cost function over a tree. These problems include calculating the permanent of a matrix, finding the volume of a convex polyhedron, and counting the number of linear extensions of a partially ordered set.

There are tree-searching algorithms which give an exact answer by simply traversing every node in the tree; however, in many cases, the tree is far too large for this to be practical. Indeed, the problem of computing tree cost is in the complexity class #P-complete (Valiant, 1979). This complexity class consists of counting problems which find the number of solutions that satisfy a corresponding NP-complete decision problem.

Accordingly, there are various approximation algorithms for tree cost, and the two main types of these are Markov Chain Monte Carlo (MCMC) and sequential importance sampling (SIS). Both of these perform random sampling on a suitably defined set.

The original version of SIS is Knuth’s algorithm (Knuth, 1975), which samples tree cost by walking a random path from the root to a leaf, where each node in the path is chosen uniformly from the children of the previously chosen node. There have been several major adaptations to Knuth’s algorithm, all of which attempt to reduce the variance of the estimates produced.

One modification of Knuth’s algorithm is to choose the nodes of the path non-uniformly, proportional to an importance function on the nodes. Of course, choosing a good importance function requires some knowledge about the structure of the tree, and so this approach is not suitable for random trees, but rather for families of trees which share some general characteristics. Some cases where this approach has produced good results can be found in Beichl and Sullivan (1999), Blitzstein and Diaconis (2011), Harris, Sullivan, and Beichl (2014), Karp and Luby (1983), for example.

There have also been adaptations of Knuth’s algorithm which change the algorithm in a more structural way, such as stratified sampling, which was introduced by Knuth’s student, Chen (1992).

Stochastic Enumeration (SE) is the most recent of the structural adaptations. It was originally introduced by Rubinstein (2013), and further developed in Rubinstein, Ridder, and Vaisman (2014). Its approach to the problem is to run many non-independent trajectories through the tree in parallel, combining their effect on the estimate at each level of the tree to produce a single final estimate of the tree cost. Alternatively, one can view SE as operating on a hypertree associated with the original tree. A similar approach to the problem was taken by Cloteaux and Valentin (2011).

In Rubinstein’s original definition, the SE algorithm was only able to count the leaves of a tree. Vaisman and Kroese (2017) updated SE to estimate tree cost for any cost function, and provided a rigorous analysis of the variance. They also showed that SE can be extended to an fully polynomial randomized approximation scheme (FPRAS) for random trees with a cost function that is 1 on every node.

In this paper, we follow up on the work of Vaisman and Kroese to develop an adaptation of SE which we call Stochastic Enumeration with Importance (SEI). This algorithm chooses paths through the tree with non-uniform probability, according to a user-defined importance function on the nodes of the tree. We provide a detailed analysis of the theoretical properties of the algorithm, including ways to bound the variance.

Just as with SIS, SEI is not suitable for random trees, but rather for families of trees which share some characteristics. Therefore, in addition to theoretical analysis in which the importance function is not specified, we also provide a detailed example, with numerical results, of a family of trees and importance functions for which SEI provides a lower variance than SE.

2 Definitions and Preliminaries

Consider a tree $T$ with node set $\mathcal{V}$ , where each node $v$ has some cost $c(v)$ given by a cost function $c:\mathcal{V}\to\mathbb{R}_{\geq 0}$ . We wish to estimate the total cost of the tree, denoted $\mathrm{Cost}(T)$ and given by

[TABLE]

If our tree is uniform, in the sense that all the nodes on a given level have the same number of children, then it is very easy to determine the number of nodes on each level.

We will call the root node level 0, the root’s children level 1, and so on. Suppose the root has $D_{0}$ children, the root’s children all have $D_{1}$ children, and so on. Then there is 1 node on level 0, $D_{0}$ nodes on level 1, $D_{0}D_{1}$ nodes on level 2, and, in general, $D_{0}D_{1}\cdots D_{i-1}$ nodes on level $i$ .

If the cost of nodes is also uniform across each level, then we can easily add up the cost of the entire tree. For each level $i$ , let the cost of any node on level $i$ be denoted $c_{i}$ . Then the cost of our tree is

[TABLE]

where $n$ is the lowest level of the tree.

Of course, most trees are not uniform is the sense described above, but the central idea of Knuth’s algorithm (Knuth, 1975) for estimating tree cost is to pretend as though they are. In Knuth’s algorithm, we walk a single path from the root to a leaf, and note the number of children that we see from each node in our path ( $D_{0},D_{1},\dots,D_{n}$ ), as well as the cost of each node in our path ( $c_{0},c_{1},\dots,c_{n}$ ). We then calculate the cost of the tree using Formula (1), which is no longer exact but is now an unbiased estimator of the tree cost.

In the SE algorithm, just as in Knuth’s algorithm, we work our way down the tree level by level from the root to the leaves. The main difference is that instead of choosing a single node on each level of the tree, we choose multiple nodes on each level. We can also think of this as choosing a single hypernode from each level of a hypertree constructed from the original tree. The following definitions are necessary to describe the structure of the hypertree.

We define a hypernode to be a set of distinct nodes $\mathbf{v}=\{v_{1},\dots,v_{m}\}\subset\mathcal{V}$ that are in the same level of the tree. We can extend the definition of the cost function to hypernodes by letting

[TABLE]

Let $S(v)$ denote the set of successors (or children) of a node in the tree. Then we can define the set of successors of a hypernode $\mathbf{v}$ as

[TABLE]

Throughout the SE algorithm, each time we move to a new level, we choose a new hypernode from among the successors $S(\mathbf{v})$ of the previous hypernode $\mathbf{v}$ . We make no distinction between these successors in terms of which node in the previous hypernode they came from. This means that some nodes in the previous hypernode may have multiple children chosen to be in the next hypernode, while other nodes in the previous hypernode may not have any children chosen to be in the next hypernode.

Obviously there is some limit on our computing power, so we have to limit the size of the hypernodes we work with to be within a budget, which we will denote $B\in\mathbb{N}$ . At each level, we will choose $B$ nodes to be in the next hypernode, as long as $S(\mathbf{v})$ is larger than $B$ . If $|S(\mathbf{v})|\leq B$ , then we will take all of $S(\mathbf{v})$ to be the next hypernode.

Thus, if our current hypernode is $\mathbf{v}$ , the candidates for our next hypernode, which we call the hyperchildren of $\mathbf{v}$ , are the elements of the set

[TABLE]

Many of the statements and proofs throughout this paper are in a recursive form that refers to subforests of a tree, and so we lastly need to define a forest rooted at a hypernode. For a hypernode $\mathbf{v}$ , the forest rooted at $\mathbf{v}$ , denoted $T_{\mathbf{v}}$ , is simply the union of all the trees rooted at each of the nodes in $\mathbf{v}$ .

[TABLE]

We can also extend the notion of the total cost of a tree to a forest rooted at a hypernode by letting

[TABLE]

Let’s look at an example to familiarize ourselves further with the notation.

Example 2.1.

Consider the tree in Figure 1. It is labeled with a possible sequence of hypernodes that could be chosen by the SE algorithm, using a budget of $B=2$ .

On level 0, the root is automatically chosen to be the first hypernode, $\mathbf{x}_{0}$ . We could then refer to the entire tree as $T_{\mathbf{x}_{0}}$ . On level 1, we have $S(\mathbf{x}_{0})=\{b,c\}$ . Since $|S(\mathbf{x}_{0})|\leq B$ , we take all of $S(\mathbf{x}_{0})$ to be our next hypernode, so $\mathbf{x}_{1}=\{b,c\}$ .

On level 2, we have $S(\mathbf{x}_{1})=\{d,e,f\}$ , so our choices for $\mathbf{x}_{2}$ are the elements of $H(\mathbf{x}_{1})=\{\{d,e\},\{d,f\},\{e,f\}\}$ . Let’s choose $\mathbf{x}_{2}=\{d,e\}$ . Similarly, on level 3, we have $S(\mathbf{x}_{2})=\{g,h,i\}$ , so our choices for $\mathbf{x}_{3}$ are $H(\mathbf{x}_{2})=\{\{g,h\},\{g,i\},\{h,i\}\}$ . Let’s choose $\mathbf{x}_{3}=\{h,i\}$ .

Finally, on level 4, we have $S(\mathbf{x}_{3})=\{m\}$ . Since $|S(\mathbf{x}_{3})|\leq B$ , we take all of $S(\mathbf{x}_{3})$ to be our next hypernode, so $\mathbf{x}_{4}=\{m\}$ .∎

3 Stochastic Enumeration with Arbitrary Probability

We are now ready to state the first algorithm, Stochastic Enumeration with arbitrary probability (SEP). It is a generalization of the updated Stochastic Enumeration algorithm in Vaisman and Kroese (2017), which used uniform probabilities.

Note that the quantity $D_{k}$ is an estimate of the number of children of the nodes in level $k$ , so that after each update in line 5, $D$ is an estimate of the number of nodes in level $k+1$ of the tree.

Likewise, the quantity $\frac{c(\mathbf{x}_{k+1})}{|\mathbf{x}_{k+1}|}$ is an estimate of the average cost of nodes on level $k+1$ , so that by adding $\frac{c(\mathbf{x}_{k+1})}{|\mathbf{x}_{k+1}|}D$ to $C_{\mathrm{SEP}}$ on line 5, we are adding the estimated cost of all of level $k+1$ of the tree.

Before analyzing this algorithm further, let’s look at an example to get a better idea of how it works.

Example 3.1.

Consider the tree in Figure 2. To keep things simple, we’ll use a budget of $B=2$ and a cost function $c$ that is 1 on every node. Clearly the total cost of the tree is the number of nodes, 14. This choice simplifies $\frac{c(\mathbf{x}_{k+1})}{|\mathbf{x}_{k+1}|}$ to 1, so the update command for $C_{\mathrm{SEP}}$ becomes

[TABLE]

Let’s choose hypernodes with a uniform probability, meaning $P(\mathbf{x}_{k+1})=1/|H(\mathbf{x}_{k})|$ . Since $|H(\mathbf{x}_{k})|=\binom{|S(\mathbf{x}_{k})|}{|\mathbf{x}_{k+1}|}$ , this makes the formula for $D_{k}$ simplify to $\frac{|S(\mathbf{x}_{k})|}{|\mathbf{x}_{k}|}$ , so the update command for $D$ becomes

[TABLE]

Note that $\frac{|S(\mathbf{x}_{k})|}{|\mathbf{x}_{k}|}$ is the average number of children of the nodes in $\mathbf{x}_{k}$ . In the original SE algorithm, the update command for $D$ always looks like this.

Now let’s examine a possible sequence of hypernodes produced by Algorithm 2, as shown in Figure 2, which is the same as the previous example.

We initialize with $k=0$ , $\mathbf{x}_{0}=\{a\}$ , $D=1$ , $C_{\mathrm{SEP}}=1$ . Then we compute $S(\mathbf{x}_{0})=\{b,c\}$ , which means $H(\mathbf{x}_{0})=\{\{b,c\}\}$ , and advance to $\mathbf{x}_{1}=\{b,c\}$ with $P(\mathbf{x}_{1})=1$ . We update

[TABLE]

We advance to $k=1$ and loop. We compute $S(\mathbf{x}_{1})=\{d,e,f\}$ , which means $H(\mathbf{x}_{1})=\{\{d,e\},\{d,f\},\{e,f\}\}$ , and advance to $\mathbf{x}_{2}=\{d,e\}$ with $P(\mathbf{x}_{2})=\frac{1}{3}$ . We update

[TABLE]

We advance to $k=2$ and loop. We compute $S(\mathbf{x}_{2})=\{g,h,i\}$ , which means $H(\mathbf{x}_{2})=\{\{g,h\},\{g,i\},\{h,i\}\}$ , and we advance to $\mathbf{x}_{3}=\{h,i\}$ with $P(\mathbf{x}_{3})=\frac{1}{3}$ . We update

[TABLE]

We advance to $k=3$ and loop. We compute $S(\mathbf{x}_{3})=\{m\}$ , which means $H(\mathbf{x}_{3})=\{\{m\}\}$ , and we advance to $\mathbf{x}_{4}=\{m\}$ with $P(\mathbf{x}_{1})=1$ . We update

[TABLE]

We increase to $k=4$ and loop. We compute $S(\mathbf{x}_{4})=\emptyset$ , so we are in the terminal position and we stop. The algorithm returns $|\mathbf{x}_{0}|C_{\mathrm{SEP}}=12.75$ as an estimator of the cost of the tree. This completes the example.∎

Now we begin our analysis of Algorithm 1. In general, the output of Algorithm 1 is a random variable

[TABLE]

where $\tau$ is some height less than or equal to the height of $T_{\mathbf{x}_{0}}$ .

This naturally suggests a recursive formulation of the output,

[TABLE]

Let $\mathbf{w}$ be a hyperchild of $\mathbf{v}$ selected from $H(\mathbf{v})$ with probability $P(\mathbf{w})$ . Then we have

[TABLE]

Before proceeding to a proof of the correctness of Algorithm 1, we stop to note a lemma that we will use in this and other proofs throughout the paper.

Lemma 3.1.

[TABLE]

Proof.

We begin by expanding the right hand side of the proposed equation.

[TABLE]

Since $|\mathbf{w}|=\mathrm{min}(B,|S(\mathbf{v})|)$ does not depend on the particular choice of $\mathbf{w}$ , we can move the factor in which it appears outside the summation.

[TABLE]

Each $w\in S(\mathbf{v})$ appears in precisely $\binom{|S(\mathbf{v})|-1}{|\mathbf{w}|-1}$ of the $\mathbf{w}\in H(\mathbf{v})$ , therefore we can simplify the double summation.

[TABLE]

∎∎

Theorem 3.1.

Algorithm 1 is an unbiased estimator of tree cost, meaning

[TABLE]

Proof.

The proof proceeds by induction over the height of the tree. For a forest of height 0, we have $|S(\mathbf{v})|=0$ , so the algorithm returns the exact answer

[TABLE]

Assuming that the proposition is correct for forests with heights strictly less than the height of $T_{\mathbf{v}}$ , we have

[TABLE]

Applying Lemma 3.1, we get

[TABLE]

∎∎

Now that we know Algorithm 1 works, we can start thinking about how to improve the variance of the estimates it produces.

The purpose of using a non-uniform probability distribution to select each hypernode is to try to achieve a better variance between the estimates. Therefore, it is important to know the optimal probability distribution, in other words, the probability distribution that would yield the exact answer for every estimate.

As with Knuth’s algorithm, it turns out that the optimal probability for choosing a hypernode is proportional to the cost of the forest rooted at the hypernode. Details are given below.

Theorem 3.2.

In Algorithm 1, if each hypernode $\mathbf{w}$ is chosen from all possible hypernodes in $H(\mathbf{v})$ with probability

[TABLE]

then $C_{\mathrm{SEP}}$ is a zero-variance estimator, meaning

[TABLE]

Proof.

The proof proceeds by induction over the height of the tree. For a tree of height 0, we have $|S(\mathbf{v})|=0$ , so the algorithm returns the exact answer

[TABLE]

Assuming that the proposition is correct for forests with heights strictly less than the height of $T_{\mathbf{v}}$ , we have

[TABLE]

Applying Lemma 3.1, we get

[TABLE]

∎∎

We are now ready to discuss using an importance function to implement a probability distribution.

4 Stochastic Enumeration with Importance

The information in Theorem 3.2 suggests that we should use a probability distribution in which each hypernode has a probability that is proportional to the cost of the forest beginning at that hypernode. Obviously this will be difficult to achieve even as an estimate, since it is the same problem that we are trying to address with our algorithms.

However, even supposing that we did have some way of estimating the ideal probability for each hypernode, there is another problem with trying to implement a non-uniform probability distribution on the hypernodes. Simply put, $|H(\mathbf{v})|=\binom{|S(\mathbf{v})|}{|\mathbf{w}|}$ may be extremely large, and so, if we hope to keep the running time of the algorithm under control, we need a way of choosing hypernodes that does not require us to calculate or store the probability of each individual hypernode in $H(\mathbf{v})$ .

It turns out that there is an easy way to do this. Consider a function $r$ from the nodes of a tree to the positive real numbers. For a node $x$ , we will call $r(x)$ the weight of $x$ or the importance of $x$ . We can extend the domain of $r$ to sets of nodes by defining the weight of a set of nodes $\mathbf{x}=\{x_{1},x_{2},\ldots,x_{m}\}$ as $r(\mathbf{x})=r(x_{1})+r(x_{2})+\cdots+r(x_{m})$ .

Given this weighting scheme, there is a way to choose a hypernode $\mathbf{w}$ with probability

[TABLE]

that only requires us to calculate the weights of $S(\mathbf{v})$ , and not of $H(\mathbf{v})$ . This method is described in Algorithm 2.

It may not be obvious, but Algorithm 2 is simply Algorithm 1 with a specific probability distribution implemented, as we shall prove now.

Theorem 4.1.

Algorithm 2 is an unbiased estimator of tree cost, meaning

[TABLE]

Proof.

We begin by calculating the probability with which each $\mathbf{x}_{k+1}$ is being selected.

Since one element, $x$ , is selected separately from the rest of $\mathbf{x}_{k+1}$ , there are $|\mathbf{x}_{k+1}|$ different and mutually exclusive ways in which we can get the same $\mathbf{x}_{k+1}$ . This is because each element in $\mathbf{x}_{k+1}$ can play the role of $x$ .

Once an $x$ has been selected from $S(\mathbf{x}_{k})$ with probability ${r(x)}/{r(S(\mathbf{x}_{k}))}$ , the rest of the elements are selected uniformly at random from the remaining elements in $S(\mathbf{x}_{k})$ , so the remaining elements are collectively selected with probability $1/\binom{|S(\mathbf{x}_{k})|-1}{|\mathbf{x}_{k+1}|-1}$ .

Therefore the probability with which any given $\mathbf{x}_{k+1}$ is selected is

[TABLE]

The formula for $D_{k}$ in Algorithm 2 is then obtained by a simple substitution into the formula given in Algorithm 1, and so the proposition follows from Theorem 3.1. ∎∎

Let $\mathbf{w}$ be selected from $H(\mathbf{v})$ as described in Algorithm 2. Then the probability with which $\mathbf{w}$ is selected is

[TABLE]

Since each $x\in S(\mathbf{v})$ appears in precisely $\binom{|S(\mathbf{v})|-1}{|\mathbf{w}|-1}$ of the $\mathbf{x}\in H(\mathbf{v})$ , we can also write this as

[TABLE]

which was the desired probability. Clearly, from Theorem 3.2, the ideal importance function would be $r(\mathbf{w})=\mathrm{Cost}(T_{\mathbf{w}})$ .

Before analyzing this algorithm any further, let’s look at an example to get a better idea of how it works.

Example 4.1.

Consider the tree in Figure 3, which is the same as that in the previous examples, except that it has been labeled with importance function values in addition to the names of the nodes.

To keep things simple, we are reusing as many parameters as possible from Example 3.1, so the budget is $B=2$ and the cost function $c$ is 1 on every node. Again, the total cost of the tree is the number of nodes, 14, and this choice simplifies $\frac{c(\mathbf{x}_{k+1})}{|\mathbf{x}_{k+1}|}$ to 1, so the update command for $C_{\mathrm{SEI}}$ becomes

[TABLE]

The importance function we are using for each node $x$ is the number of leaves under $x$ , including $x$ itself if it is a leaf. We have labeled the importance of each node after the node’s name in the figure.

Now let’s examine a possible sequence of hypernodes produced by Algorithm 2, as shown in the figure.

We initialize with $k=0$ , $\mathbf{x}_{0}=\{a\}$ , $D=1$ , $C_{\mathrm{SEI}}=1$ . Then we compute $S(\mathbf{x}_{0})=\{b,c\}$ . We choose $c$ with probability

[TABLE]

and then choose $b$ uniformly at random from the remaining elements, to give us $\mathbf{x}_{1}=\{b,c\}$ . We update

[TABLE]

We increase to $k=1$ and loop. We compute $S(\mathbf{x}_{1})=\{d,e,f\}$ . We choose $e$ with probability

[TABLE]

and then choose $d$ uniformly at random from the remaining elements, giving us $\mathbf{x}_{2}=\{d,e\}$ . We update

[TABLE]

We increase to $k=2$ and loop. We compute $S(\mathbf{x}_{2})=\{g,h,i\}$ . We choose $i$ with probability

[TABLE]

and then choose $h$ uniformly at random from the remaining elements, giving us $\mathbf{x}_{2}=\{h,i\}$ . We update

[TABLE]

We increase to $k=3$ and loop. We compute $S(\mathbf{x}_{3})=\{m\}$ , and we choose $m$ with probability

[TABLE]

Since there are no remaining elements to be chosen, we have $\mathbf{x}_{4}=\{m\}$ . We then update

[TABLE]

We increase to $k=4$ and loop. We compute $S(\mathbf{x}_{4})=\emptyset$ , so we are in the terminal position and we stop. The algorithm returns $|\mathbf{x}_{0}|C_{\mathrm{SEP}}=13$ as an estimator of the cost of the tree. This completes the example.∎

5 Variance

Recall that in Equation 2, we found a recursive expression for the output of Algorithm 1 as

[TABLE]

By substituting for $P(\mathbf{w})$ with the expression we found in the proof of Theorem 4.1, we get another recursive formula for the output of Algorithm 2.

[TABLE]

With this information we can begin to analyze the variance of $C_{\mathrm{SEI}}$ , or rather, the variance of $|\mathbf{v}|C_{\mathrm{SEI}}$ , which is the actual estimate of tree cost produced by Algorithm 2.

Theorem 5.1.

For a forest $T_{\mathbf{v}}$ rooted at a hypernode $\mathbf{v}$ , the variance produced by Algorithm 2 is

[TABLE]

Proof.

We know

[TABLE]

which implies

[TABLE]

Taking the variance of both sides, we get

[TABLE]

and so

[TABLE]

We will tackle each of these terms separately. First,

[TABLE]

Next,

[TABLE]

where the last line is due to Lemma 3.1.

Substituting the results of Equations 4 and 5 into 3 yields the theorem statement. ∎∎

From Theorem 5.1 we can easily find an expression for the coefficient of variation (CV).

Corollary 5.1.

For a forest $T_{\mathbf{v}}$ rooted at a hypernode $\mathbf{v}$ , the coefficient of variation produced by Algorithm 2 is given by

[TABLE]

Proof.

This follows from Theorem 5.1 by dividing both sides of that equation by $\big{(}\mathrm{Cost}(T_{\mathbf{v}})\big{)}^{2}$ and then factoring out $\big{(}\mathrm{Cost}(T_{\mathbf{w}})\big{)}^{2}$ from $\mathrm{Var}\big{(}|\mathbf{w}|C_{\mathrm{SEI}}(T_{\mathbf{w}})\big{)}+\mathrm{Cost}(T_{\mathbf{w}})^{2}$ . ∎∎

6 Upper Bounds on the Variance

The first bound we present is almost as complicated as the formula for the coefficient of variation given in the previous section; however, this bound is very useful for proving other bounds because of its recursive structure, and so we present it first.

Theorem 6.1.

For a forest $T_{\mathbf{v}}$ rooted at a hypernode $\mathbf{v}$ , the coefficient of variation produced by Algorithm 2 is bounded by

[TABLE]

Proof.

To save space we will refer to $\mathrm{CV}^{2}\big{(}|\mathbf{v}|C_{\mathrm{SEI}}(T_{\mathbf{v}})\big{)}+1$ as $F(\mathbf{v})$ and likewise for other hypernodes. With this notation, Corollary 5.1 says that

[TABLE]

Now we factor the right hand side (moving the subtracted term before the sum to avoid confusion) and get

[TABLE]

Since the first factor on the right hand side is less than 1, this implies

[TABLE]

After canceling $-1$ from both sides, we have the result. ∎∎

Next, we will show we can bound the coefficient of variation by a measure of how close the importance function $r$ is to the Cost function. First, we need a few new definitions.

For a tree $T_{\mathbf{v}}$ of height $n$ , rooted at $\mathbf{v}$ , define $\mathbf{x}_{0}:=\mathbf{v}$ , and let $\mathbf{x}_{0},\mathbf{x}_{1},\dots,\mathbf{x}_{n}$ be any possible sequence of hypernodes that can be produced by Algorithm 2. We define a function $\alpha$ on such sequences that describes how close our importance function is to the Cost function over the sequence. Let $\alpha$ be given by

[TABLE]

Note that for a tree of height zero, the product is empty, and so $\alpha=1$ . We also have $\alpha=1$ in the case that we use the exact Cost function for the importance function.

Before stating our next bound, we first need a lemma regarding the expected value of $\alpha$ .

Lemma 6.1.

For a forest $T_{\mathbf{v}}$ , let $\mathbf{v},\mathbf{x}_{1},\dots,\mathbf{x}_{n}$ be any possible hypernode sequence produced by Algorithm 2. Then

[TABLE]

Proof.

The proof proceeds by induction on the height of the tree. For a tree $T_{\mathbf{v}}$ of height 0, $\alpha(\mathbf{v})=1$ always.

For the inductive step, let $T_{\mathbf{v}}$ be a forest of height $n$ and assume the proposition for all forests of heights $n-1$ or less. Define $\mathbf{x}_{0}:=\mathbf{v}$ . Then

[TABLE]

The right hand side sum can be expanded to

[TABLE]

Recall that the probability of choosing $\mathbf{x}_{1}$ from $H(\mathbf{v})$ is

[TABLE]

Hence

[TABLE]

where the last line is due to Lemma 3.1. ∎∎

Now we are ready to state our next bound.

Theorem 6.2.

For a forest $T_{\mathbf{v}}$ , let $\mathbf{v},\mathbf{x}_{1},\dots,\mathbf{x}_{n}$ be any possible hypernode sequence produced by Algorithm 2. Then

[TABLE]

Proof.

The proof proceeds by induction on the height of the tree. For a tree $T_{\mathbf{v}}$ of height 0, $\alpha(\mathbf{v})=1$ , and the algorithm is exact, so both sides of the inequality are zero.

For the inductive step, let $T_{\mathbf{v}}$ be a forest of height $n$ and assume the proposition for all forests of heights $n-1$ or less. Define $\mathbf{x}_{0}:=\mathbf{v}$ . Then Theorem 6.1 says

[TABLE]

Recall that the probability of choosing $\mathbf{x}_{1}$ from $H(\mathbf{x}_{0})$ is

[TABLE]

This implies

[TABLE]

Substituting the right hand side of (5) for the left hand side of (5) in the inequality yields

[TABLE]

Then from the induction hypothesis, this becomes

[TABLE]

Then due to Lemma 6.1 we can write this as

[TABLE]

∎∎

Recognizing that it may be very difficult to calculate the variance of $\alpha$ , we now introduce a looser bound which may be easier to calculate.

Theorem 6.3.

For a forest $T_{\mathbf{v}}$ , let $\mathcal{X}(\mathbf{v})$ be the collection of all possible hypernode sequences $(\mathbf{x}_{1},\dots,\mathbf{x}_{n})$ which can occur after $\mathbf{x}_{0}=\mathbf{v}$ in Algorithm 2. Then

[TABLE]

Proof.

The proof proceeds by induction on the height of the tree. For a tree $T_{\mathbf{v}}$ of height 0, we have $\alpha(\mathbf{v})=1$ , so the right hand side is zero, and we know that the algorithm is exact on trees of height zero, so the left hand side is also zero.

For the inductive step, let $T_{\mathbf{v}}$ be a forest of height $n$ and assume the proposition for all forests of heights $n-1$ or less. Define $\mathbf{x}_{0}:=\mathbf{v}$ . Then Theorem 6.1 and the inductive hypothesis give us

[TABLE]

Due to Lemma 3.1, the summation simplifies to 1, which yields the proposition. ∎∎

In case the last bound was still too difficult to calculate because it requires taking a maximum over a very large set, we now relax the bound to a product of maximums over smaller sets.

Corollary 6.1.

For a forest $T_{\mathbf{v}}$ , let $\mathcal{L}_{i}(\mathbf{v})$ be the collection of all possible hypernodes $\mathbf{x}_{i}$ which can be chosen by Algorithm 2 at level $i$ of the tree, where level 0 is the root $\mathbf{v}$ . Also let $\mathbf{x}_{0}=\mathbf{v}$ . Then

[TABLE]

Proof.

Since

[TABLE]

this follows directly from Theorem 6.3 by expanding the set over which the maximum is taken. ∎∎

7 Application: Counting Linear Extensions

7.1 Background

For a partially ordered set (poset), any total order which respects the partial order is known as a linear extension. If we view the poset as a directed acyclic graph (DAG), then a linear extension is also known as a topological sort. Determining the number of linear extensions of a given poset is a fundamental problem in the study of ordering and has many applications.

The standard method of obtaining a linear extension, which was first described by Kahn (1962), corresponds to a breadth-first search of the corresponding DAG. The procedure is as follows. We choose some maximal element of the poset, and then delete that element from the poset, thus newly rendering some elements maximal. We repeat until there are no elements left in the poset. The order in which the elements were deleted is then a linear extension of the poset.

In fact, all linear extensions can be obtained in this manner, so the number of linear extensions is equal to the number of ways to execute the procedure. We can think of the choices available to us at each step as forming a decision tree, where each branching corresponds to a choice of a maximal element, and each path from root to leaf corresponds to one linear extension. If we let the cost function on the decision tree be one on the leaves and zero everywhere else, then the cost of the entire tree is the number of linear extensions.

Example 7.1.

Consider the poset whose Hasse diagram is shown in the left panel of Figure 4. The corresponding decision tree is shown in the right panel. Each node in the decision tree is labeled according to which maximal node in the poset was deleted in order to continue the linear extension. This poset has seven linear extensions, as shown by the decision tree.

Notice that the two boxed regions of the decision tree are identical. This is because they both occur after the deletion of $a$ and $b$ , in either order. There are other identical regions, as well, such as those that occur after deletion of $a,b,d$ in any of the three possible orders.∎

Aside from posets which are already linearly ordered, all linear extension decision trees contain identical regions such as those highlighted in the example. For example, in any poset with at least two maximal elements, $a$ and $b$ , the region of the decision tree corresponding to extensions beginning with $ab$ will be identical to the region of the decision tree corresponding to extensions beginning with $ba$ .

Hence, even if a poset is generated randomly, the linear extension decision tree arising from the poset will not be random because many regions within it are not independent of one another. If the decision tree were completely random, there would be no reason to place more importance on any one particular branch than another. Since the tree is not completely random, however, we have reason to suppose that we may benefit by introducing an importance function.

7.2 Importance Functions

Three different importance functions were tested, all of which were compared to the uniform importance function, which is 1 on all nodes. In order to describe the three importance functions, we must first define some notation.

Let $n$ denote the number of elements in the poset. For a node $x$ in a decision tree, let $\mathrm{sib}(x)$ denote the number of siblings of $x$ in the tree; that is, the number of nodes which are children of the parent of $x$ . Let $\mathrm{height}(x)$ denote the height of $x$ in the tree, so that leaves have height 0 and the root has height $n$ .

Each node in the decision tree corresponds to some element in the poset, as shown, for example, in Figure 4. For a node $x$ in a decision tree, let $\mathrm{desc}(x)$ be the number of elements in the poset which are descendants of the poset element that $x$ corresponds to. Note that $\mathrm{desc}(x)$ also counts whichever poset element $x$ corresponds to, so that we always have $\mathrm{desc}(x)\geq 1$ .

Then the three importance functions are as follows.

Importance function 1:

[TABLE]

Importance function 2:

[TABLE]

Importance function 3:

[TABLE]

It is worthwhile to note that when the budget $B=1$ , each hypernode $\mathbf{x}_{i}$ chosen by Algorithm 2 is a single node, and so each $S(\mathbf{x}_{i})$ is a single sibling set. Hence the value of $\mathrm{sib}(x)$ is uniform for all $x\in S(\mathbf{x}_{i})$ for all $\mathbf{x}_{i}$ . Thus $\mathrm{sib}(x)$ will cancel out of all instances of the quotient

[TABLE]

and have no effect on the importance function. Therefore, when the budget $B=1$ , importance function 1 is the same as the uniform importance function, as can be seen in Figure 5, and importance functions 2 and 3 are the same as those described in Beichl, Jensen, and Sullivan (2017), where their properties are explored in more detail.

7.3 Methodology and Results

All numerical tests of Algorithm 2 were implemented in C++ using a sparse representation of the posets. All posets were randomly generated in the following manner. Given the poset elements $v_{1},v_{2},\dots,v_{n}$ , for each pair of elements $v_{i}$ and $v_{j}$ with $i<j$ , the relation $v_{i}>v_{j}$ was given a 20% probability to exist using a pseudo-random number generator. The posets were then transitively completed.

The first set of tests compared the relative variance of the four importance functions as a function of the size of the posets. These tests were repeated for the fixed budget values $B=1,5,10,15,20$ . At each budget size, the size of the poset, $n$ , ran through the values $10,15,20,\dots,85$ .

For each value of $B$ and $n$ , $n^{2}$ posets were generated, and $n^{2}$ estimates were performed on each poset to calculate the relative variance for that poset. The relative variance of for each poset was then averaged for each value of $n$ .

The results for the different importance functions were of differing orders of magnitude; therefore, they are compared on a log-log scale. The results are shown in Figures 5-9.

The second set of tests compared the relative variance of the four importance functions as a function of the budget $B$ . These tests were repeated for the fixed poset sizes $n=10,20,40$ . At each poset size, the budget, $B$ , ran through the values $1,2,3,\dots,100$ .

Just as in the first set of tests, for each value of $B$ and $n$ , $n^{2}$ posets were generated, and $n^{2}$ estimates were performed on each poset to calculate the relative variance for that poset. The relative variance of each poset was then averaged for each value of $B$ .

The results drop sharply as the budget grows; therefore, they are compared on a semi-log scale. The results are shown in Figures 10-12.

As was noted by Vaisman and Kroese (2017), it is possible for an importance function to make the variance worse, as we see in Figure 10 in posets with 10 nodes using importance function 1. However, at all of the poset sizes tested, importance functions 2 and 3 present significant improvements to the variance, and this serves as evidence that it is worthwhile to incorporate an importance function into Stochastic Enumeration.

The importance function which performed the best was importance function 3, with a few exceptions. For posets with less than 20 elements and a small budget of 1 or 2, importance function 2 performed slightly better. In general, the improvements in variance due to all importance functions decreased slightly as the budget increased.

8 Conclusions and Future Work

We designed and implemented two generalizations of the stochastic enumeration method for counting the leaves of trees: the first algorithm for any user-supplied probability distribution on hypernodes, and the second algorithm for a probability distribution induced by any user-supplied importance function on the nodes of the tree. We numerically tested the second algorithm on the problem of counting linear extensions of random posets, and showed that introducing an importance function can significantly reduce the variance of estimates.

Although our importance functions performed well in numerical testing, in future, we would like to explore the question of how to find better importance functions, as well as importance functions with provable performance guarantees.

Bibliography13

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Beichl and Sullivan (1999) Beichl I, Sullivan F (1999) Approximating the permanent via importance sampling with applications to the dimer covering problem. Journal of Computational Physics 149:128–147
2Beichl et al (2017) Beichl I, Jensen A, Sullivan F (2017) A sequential importance sampling algorithm for estimating linear extensions, preprint
3Blitzstein and Diaconis (2011) Blitzstein J, Diaconis P (2011) A sequential importance sampling algorithm for generating random graphs with prescribed degrees. Internet Mathematics 6(4):489–522
4Chen (1992) Chen PC (1992) Heuristic sampling: A method for predicting the performance of tree searching programs. SIAM Journal on Computing 21(2):295–315
5Cloteaux and Valentin (2011) Cloteaux B, Valentin LA (2011) Counting the leaves of trees. Congressus Numerantium 207:129–139
6Harris et al (2014) Harris D, Sullivan F, Beichl I (2014) Fast sequential importance sampling to estimate the graph reliability polynomial. Algorithmica 68(4):916–939
7Kahn (1962) Kahn AB (1962) Topological sorting of large networks. Commun ACM 5(11):558–562, DOI 10.1145/368996.369025
8Karp and Luby (1983) Karp RM, Luby M (1983) Monte-carlo algorithms for enumeration and reliability problems. In: Proceedings of the 24th Annual Symposium on Foundations of Computer Science, SFCS ’83, IEEE Computer Society, pp 56–64

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Stochastic Enumeration with Importance Sampling

Abstract

Acknowledgments

1 Introduction

2 Definitions and Preliminaries

Example 2.1**.**

3 Stochastic Enumeration with Arbitrary Probability

Example 3.1**.**

Lemma 3.1**.**

Proof.

Theorem 3.1**.**

Proof.

Theorem 3.2**.**

Proof.

4 Stochastic Enumeration with Importance

Theorem 4.1**.**

Proof.

Example 4.1**.**

5 Variance

Theorem 5.1**.**

Proof.

Corollary 5.1**.**

Proof.

6 Upper Bounds on the Variance

Theorem 6.1**.**

Proof.

Lemma 6.1**.**

Proof.

Theorem 6.2**.**

Proof.

Theorem 6.3**.**

Proof.

Corollary 6.1**.**

Proof.

7 Application: Counting Linear Extensions

7.1 Background

Example 7.1**.**

7.2 Importance Functions

7.3 Methodology and Results

8 Conclusions and Future Work

Example 2.1.

Example 3.1.

Lemma 3.1.

Theorem 3.1.

Theorem 3.2.

Theorem 4.1.

Example 4.1.

Theorem 5.1.

Corollary 5.1.

Theorem 6.1.

Lemma 6.1.

Theorem 6.2.

Theorem 6.3.

Corollary 6.1.

Example 7.1.