Bootstrapping Lexical Choice via Multiple-Sequence Alignment

Regina Barzilay; Lillian Lee

arXiv:cs/0205065·cs.CL·May 23, 2007·6 cites

Bootstrapping Lexical Choice via Multiple-Sequence Alignment

Regina Barzilay, Lillian Lee

PDF

Open Access

TL;DR

This paper introduces a novel automatic method for building lexicons for natural language generation using multiple-sequence alignment on multi-parallel corpora, improving efficiency and quality.

Contribution

It presents a new multiple-pass alignment algorithm that leverages multi-parallel datasets to automatically acquire lexicons, reducing reliance on labor-intensive knowledge-based methods.

Findings

01

Generated natural language proofs with high readability.

02

Achieved comparable faithfulness to semantic input as traditional systems.

03

Demonstrated effectiveness through human evaluations.

Abstract

An important component of any generation system is the mapping dictionary, a lexicon of elementary semantic expressions and corresponding natural language realizations. Typically, labor-intensive knowledge-based methods are used to construct the dictionary. We instead propose to acquire it automatically via a novel multiple-pass algorithm employing multiple-sequence alignment, a technique commonly used in bioinformatics. Crucially, our method leverages latent information contained in multi-parallel corpora -- datasets that supply several verbalizations of the corresponding semantics rather than just one. We used our techniques to generate natural language versions of computer-generated mathematical proofs, with good results on both a per-component and overall-output basis. For example, in evaluations involving a dozen human judges, our system produced output whose readability and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems