Noun-Phrase Analysis in Unrestricted Text for Information Retrieval

David A. Evans; Chengxiang Zhai (Carnegie Mellon University)

arXiv:cmp-lg/9605019·cmp-lg·February 3, 2008·95 cites

Noun-Phrase Analysis in Unrestricted Text for Information Retrieval

David A. Evans, Chengxiang Zhai (Carnegie Mellon University)

PDF

Open Access

TL;DR

This paper presents a hybrid noun-phrase analysis method that enhances information retrieval by extracting meaningful subcompounds, improving recall and precision in indexing from unrestricted natural language text.

Contribution

It introduces a novel hybrid approach combining corpus statistics and linguistic heuristics for noun-phrase analysis to improve indexing in information retrieval.

Findings

01

Indexing with extracted subcompounds improves recall and precision.

02

The techniques are useful for book indexing and automatic thesaurus extraction.

03

The method is robust and efficient for processing large natural-language datasets.

Abstract

Information retrieval is an important application area of natural-language processing where one encounters the genuine challenge of processing large quantities of unrestricted natural-language text. This paper reports on the application of a few simple, yet robust and efficient noun-phrase analysis techniques to create better indexing phrases for information retrieval. In particular, we describe a hybrid approach to the extraction of meaningful (continuous or discontinuous) subcompounds from complex noun phrases using both corpus statistics and linguistic heuristics. Results of experiments show that indexing based on such extracted subcompounds improves both recall and precision in an information retrieval system. The noun-phrase analysis techniques are also potentially useful for book indexing and automatic thesaurus extraction.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Text Analysis Techniques