CRYSTAL: Inducing a Conceptual Dictionary
Stephen Soderland, David Fisher, Jonathan Aseltine, Wendy Lehnert, (University of Massachusetts)

TL;DR
CRYSTAL automatically induces a minimal, accurate dictionary of conceptual patterns from text corpora, enhancing information extraction by surpassing human-designed rules through error testing.
Contribution
It introduces a system that automatically generates a concise, reliable conceptual dictionary for information extraction, optimizing coverage and accuracy.
Findings
CRYSTAL achieves high accuracy in identifying relevant information.
The induced dictionaries often outperform human-crafted rules.
The system minimizes dictionary size while maintaining coverage.
Abstract
One of the central knowledge sources of an information extraction system is a dictionary of linguistic patterns that can be used to identify the conceptual content of a text. This paper describes CRYSTAL, a system which automatically induces a dictionary of "concept-node definitions" sufficient to identify relevant information from a training corpus. Each of these concept-node definitions is generalized as far as possible without producing errors, so that a minimum number of dictionary entries cover the positive training instances. Because it tests the accuracy of each proposed definition, CRYSTAL can often surpass human intuitions in creating reliable extraction rules.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Natural Language Processing Techniques · Semantic Web and Ontologies
