Using the Web as an Implicit Training Set: Application to Noun Compound   Syntax and Semantics

Preslav Nakov

arXiv:1912.01113·cs.CL·December 4, 2019·5 cites

Using the Web as an Implicit Training Set: Application to Noun Compound Syntax and Semantics

Preslav Nakov

PDF

Open Access

TL;DR

This paper presents novel unsupervised and lightly supervised methods leveraging the Web as a corpus to analyze noun compound syntax and semantics, achieving state-of-the-art results and improving related NLP tasks.

Contribution

It introduces new surface features and paraphrases for noun compound analysis using Web data, enhancing syntactic disambiguation and semantic understanding.

Findings

01

State-of-the-art accuracy in noun compound bracketing

02

Effective application of features to prepositional phrase attachment

03

Improved machine translation through paraphrasing techniques

Abstract

An important characteristic of English written text is the abundance of noun compounds - sequences of nouns acting as a single noun, e.g., colon cancer tumor suppressor protein. While eventually mastered by domain experts, their interpretation poses a major challenge for automated analysis. Understanding noun compounds' syntax and semantics is important for many natural language applications, including question answering, machine translation, information retrieval, and information extraction. I address the problem of noun compounds syntax by means of novel, highly accurate unsupervised and lightly supervised algorithms using the Web as a corpus and search engines as interfaces to that corpus. Traditionally the Web has been viewed as a source of page hit counts, used as an estimate for n-gram word frequencies. I extend this approach by introducing novel surface features and paraphrases,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification