Bootstrapping Syntax and Recursion using Alignment-Based Learning
Menno van Zaanen

TL;DR
This paper presents an unsupervised alignment-based algorithm that induces syntactic structure and recursion from unstructured natural language corpora, demonstrating promising results on real-world datasets.
Contribution
It introduces a novel alignment-based unsupervised learning method that learns syntax and recursion without supervision, using sentence alignment and interchangeability concepts.
Findings
Successfully induced syntactic constituents from untagged corpora
Achieved promising numerical results on ATIS and OVIS datasets
Demonstrated that even simple alignment algorithms can learn recursion
Abstract
This paper introduces a new type of unsupervised learning algorithm, based on the alignment of sentences and Harris's (1951) notion of interchangeability. The algorithm is applied to an untagged, unstructured corpus of natural language sentences, resulting in a labelled, bracketed version of the corpus. Firstly, the algorithm aligns all sentences in the corpus in pairs, resulting in a partition of the sentences consisting of parts of the sentences that are similar in both sentences and parts that are dissimilar. This information is used to find (possibly overlapping) constituents. Next, the algorithm selects (non-overlapping) constituents. Several instances of the algorithm are applied to the ATIS corpus (Marcus et al., 1993) and the OVIS (Openbaar Vervoer Informatie Systeem (OVIS) stands for Public Transport Information System.) corpus (Bonnema et al., 1997). Apart from the promising…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems · Topic Modeling
