Unbiased and Efficient Sampling of Dependency Trees
Milo\v{s} Stanojevi\'c

TL;DR
This paper addresses the challenge of sampling dependency trees with a single-root constraint, identifying bias in existing algorithms and proposing unbiased, more efficient methods for sampling with and without replacement.
Contribution
It reveals bias in the Wilson-RC algorithm and introduces two unbiased sampling algorithms, along with two efficient algorithms for sampling multiple trees without replacement.
Findings
Wilson-RC produces biased samples
Proposed unbiased algorithms for sampling with replacement
New algorithms significantly improve efficiency for sampling multiple trees
Abstract
Most computational models of dependency syntax consist of distributions over spanning trees. However, the majority of dependency treebanks require that every valid dependency tree has a single edge coming out of the ROOT node, a constraint that is not part of the definition of spanning trees. For this reason all standard inference algorithms for spanning trees are suboptimal for inference over dependency trees. Zmigrod et al. (2021b) proposed algorithms for sampling with and without replacement from the dependency tree distribution that incorporate the single-root constraint. In this paper we show that their fastest algorithm for sampling with replacement, Wilson-RC, is in fact producing biased samples and we provide two alternatives that are unbiased. Additionally, we propose two algorithms (one incremental, one parallel) that reduce the asymptotic runtime of algorithm for sampling k…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Bayesian Modeling and Causal Inference
