Bootstrapping Structure using Similarity
Menno van Zaanen

TL;DR
This paper introduces a similarity-based algorithm inspired by string edit-distance to bootstrap syntactic structure from unannotated sentences, achieving high precision in structure learning on real corpora.
Contribution
The paper presents a novel similarity-based bootstrapping algorithm for extracting syntactic structure from unannotated text, extending prior methods with a focus on sentence similarity.
Findings
Achieved 86.04% non-crossing brackets precision on ATIS corpus.
Achieved 89.39% non-crossing brackets precision on OVIS corpus.
Demonstrated effectiveness of the method in learning structure from flat sentences.
Abstract
In this paper a new similarity-based learning algorithm, inspired by string edit-distance (Wagner and Fischer, 1974), is applied to the problem of bootstrapping structure from scratch. The algorithm takes a corpus of unannotated sentences as input and returns a corpus of bracketed sentences. The method works on pairs of unstructured sentences or sentences partially bracketed by the algorithm that have one or more words in common. It finds parts of sentences that are interchangeable (i.e. the parts of the sentences that are different in both sentences). These parts are taken as possible constituents of the same type. While this corresponds to the basic bootstrapping step of the algorithm, further structure may be learned from comparison with other (similar) sentences. We used this method for bootstrapping structure from the flat sentences of the Penn Treebank ATIS corpus, and compared…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Algorithms and Data Compression · Machine Learning and Algorithms
