Fast Statistical Parsing of Noun Phrases for Document Indexing
Chengxiang Zhai (Carnegie Mellon University)

TL;DR
This paper introduces a fast probabilistic noun phrase parser that improves document indexing and retrieval performance in large-scale information retrieval systems by incorporating syntactic phrases.
Contribution
A novel efficient probabilistic model for noun phrase parsing is developed and applied to enhance document indexing in large-scale IR systems.
Findings
Syntactic phrases improve retrieval performance.
Supplementing words with phrases yields significant gains.
Parser is effective on large 250MB document collection.
Abstract
Information Retrieval (IR) is an important application area of Natural Language Processing (NLP) where one encounters the genuine challenge of processing large quantities of unrestricted natural language text. While much effort has been made to apply NLP techniques to IR, very few NLP techniques have been evaluated on a document collection larger than several megabytes. Many NLP techniques are simply not efficient enough, and not robust enough, to handle a large amount of text. This paper proposes a new probabilistic model for noun phrase parsing, and reports on the application of such a parsing technique to enhance document indexing. The effectiveness of using syntactic phrases provided by the parser to supplement single words for indexing is evaluated with a 250 megabytes document collection. The experiment's results show that supplementing single words with syntactic phrases for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Information Retrieval and Search Behavior
