Distributional Framework for Emergent Knowledge Acquisition and its   Application to Automated Document Annotation

Vit Novacek

arXiv:1210.3241·cs.AI·October 12, 2012

Distributional Framework for Emergent Knowledge Acquisition and its Application to Automated Document Annotation

Vit Novacek

PDF

Open Access

TL;DR

This paper presents a tensor-based distributional framework for extracting emergent knowledge from large text corpora, demonstrated through automated annotation of biomedical articles with MeSH terms.

Contribution

It introduces a novel tensor-based distributional approach for unsupervised knowledge acquisition from text, applied to biomedical document annotation.

Findings

01

Successfully inferred implicit term relationships

02

Generated conjunctive IF-THEN rules from text

03

Enhanced biomedical article annotation accuracy

Abstract

The paper introduces a framework for representation and acquisition of knowledge emerging from large samples of textual data. We utilise a tensor-based, distributional representation of simple statements extracted from text, and show how one can use the representation to infer emergent knowledge patterns from the textual data in an unsupervised manner. Examples of the patterns we investigate in the paper are implicit term relationships or conjunctive IF-THEN rules. To evaluate the practical relevance of our approach, we apply it to annotation of life science articles with terms from MeSH (a controlled biomedical vocabulary and thesaurus).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Algorithms and Data Compression