Efficient Sampling of Dependency Structures
Ran Zmigrod, Tim Vieira, Ryan Cotterell

TL;DR
This paper introduces algorithms for sampling dependency trees with root constraints in NLP, including a novel method for sampling multiple trees without replacement efficiently.
Contribution
It adapts existing spanning tree sampling algorithms to handle root constraints and proposes a new algorithm for sampling multiple trees without replacement.
Findings
Wilson's algorithm runs in O(H) time, where H is the mean hitting time.
Colbourn's algorithm has a runtime of O(N^3), often exceeding the mean hitting time.
The new method samples K trees without replacement in O(K N^3 + K^2 N) time.
Abstract
Probabilistic distributions over spanning trees in directed graphs are a fundamental model of dependency structure in natural language processing, syntactic dependency trees. In NLP, dependency trees often have an additional root constraint: only one edge may emanate from the root. However, no sampling algorithm has been presented in the literature to account for this additional constraint. In this paper, we adapt two spanning tree sampling algorithms to faithfully sample dependency trees from a graph subject to the root constraint. Wilson (1996)'s sampling algorithm has a running time of where is the mean hitting time of the graph. Colbourn (1996)'s sampling algorithm has a running time of , which is often greater than the mean hitting time of a directed graph. Additionally, we build upon Colbourn's algorithm and present a novel extension that can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Data Mining Algorithms and Applications
