Data-Oriented Language Processing. An Overview

Rens Bod; Remko Scha (University of Amsterdam)

arXiv:cmp-lg/9611003·cmp-lg·February 3, 2008·5 cites

Data-Oriented Language Processing. An Overview

Rens Bod, Remko Scha (University of Amsterdam)

PDF

Open Access

TL;DR

This paper reviews the data-oriented processing (DOP) approach to language understanding, emphasizing the use of large corpora of annotated phrase-structure trees to analyze new utterances based on fragment probabilities.

Contribution

It provides an in-depth discussion of a specific DOP model using labeled phrase-structure trees and surveys various other models employing different criteria and formalism.

Findings

01

DOP models use corpus fragments to estimate analysis probabilities

02

Different models vary in fragment extraction and disambiguation strategies

03

Richer corpus annotations can enhance model performance

Abstract

During the last few years, a new approach to language processing has started to emerge, which has become known under various labels such as "data-oriented parsing", "corpus-based interpretation", and "tree-bank grammar" (cf. van den Berg et al. 1994; Bod 1992-96; Bod et al. 1996a/b; Bonnema 1996; Charniak 1996a/b; Goodman 1996; Kaplan 1996; Rajman 1995a/b; Scha 1990-92; Sekine & Grishman 1995; Sima'an et al. 1994; Sima'an 1995-96; Tugwell 1995). This approach, which we will call "data-oriented processing" or "DOP", embodies the assumption that human language perception and production works with representations of concrete past language experiences, rather than with abstract linguistic rules. The models that instantiate this approach therefore maintain large corpora of linguistic representations of previously occurring utterances. When processing a new input utterance, analyses of this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems