Bootstrapping Syntax and Recursion using Alignment-Based Learning

Menno van Zaanen

arXiv:cs/0104007·cs.LG·September 25, 2009·6 cites

Bootstrapping Syntax and Recursion using Alignment-Based Learning

Menno van Zaanen

PDF

Open Access

TL;DR

This paper presents an unsupervised alignment-based algorithm that induces syntactic structure and recursion from unstructured natural language corpora, demonstrating promising results on real-world datasets.

Contribution

It introduces a novel alignment-based unsupervised learning method that learns syntax and recursion without supervision, using sentence alignment and interchangeability concepts.

Findings

01

Successfully induced syntactic constituents from untagged corpora

02

Achieved promising numerical results on ATIS and OVIS datasets

03

Demonstrated that even simple alignment algorithms can learn recursion

Abstract

This paper introduces a new type of unsupervised learning algorithm, based on the alignment of sentences and Harris's (1951) notion of interchangeability. The algorithm is applied to an untagged, unstructured corpus of natural language sentences, resulting in a labelled, bracketed version of the corpus. Firstly, the algorithm aligns all sentences in the corpus in pairs, resulting in a partition of the sentences consisting of parts of the sentences that are similar in both sentences and parts that are dissimilar. This information is used to find (possibly overlapping) constituents. Next, the algorithm selects (non-overlapping) constituents. Several instances of the algorithm are applied to the ATIS corpus (Marcus et al., 1993) and the OVIS (Openbaar Vervoer Informatie Systeem (OVIS) stands for Public Transport Information System.) corpus (Bonnema et al., 1997). Apart from the promising…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems · Topic Modeling