Corpus-Driven Knowledge Acquisition for Discourse Analysis
Stephen Soderland, Wendy Lehnert (University of Massachusetts)

TL;DR
This paper explores how machine learning techniques can leverage large text corpora to automatically acquire knowledge for discourse analysis and information extraction, reducing manual effort and improving scalability across domains.
Contribution
It demonstrates the use of ML algorithms to capture implicit domain knowledge from text corpora for discourse analysis and IE, bypassing the need for extensive hand-coded heuristics.
Findings
ML supports knowledge acquisition at higher language analysis levels
Reduces manual effort in porting IE systems to new domains
Enhances scalability and portability of information extraction
Abstract
The availability of large on-line text corpora provides a natural and promising bridge between the worlds of natural language processing (NLP) and machine learning (ML). In recent years, the NLP community has been aggressively investigating statistical techniques to drive part-of-speech taggers, but application-specific text corpora can be used to drive knowledge acquisition at much higher levels as well. In this paper we will show how ML techniques can be used to support knowledge acquisition for information extraction systems. It is often very difficult to specify an explicit domain model for many information extraction applications, and it is always labor intensive to implement hand-coded heuristics for each new domain. We have discovered that it is nevertheless possible to use ML algorithms in order to capture knowledge that is only implicitly present in a representative text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Data Mining Algorithms and Applications · Topic Modeling
