Distributional Framework for Emergent Knowledge Acquisition and its Application to Automated Document Annotation
Vit Novacek

TL;DR
This paper presents a tensor-based distributional framework for extracting emergent knowledge from large text corpora, demonstrated through automated annotation of biomedical articles with MeSH terms.
Contribution
It introduces a novel tensor-based distributional approach for unsupervised knowledge acquisition from text, applied to biomedical document annotation.
Findings
Successfully inferred implicit term relationships
Generated conjunctive IF-THEN rules from text
Enhanced biomedical article annotation accuracy
Abstract
The paper introduces a framework for representation and acquisition of knowledge emerging from large samples of textual data. We utilise a tensor-based, distributional representation of simple statements extracted from text, and show how one can use the representation to infer emergent knowledge patterns from the textual data in an unsupervised manner. Examples of the patterns we investigate in the paper are implicit term relationships or conjunctive IF-THEN rules. To evaluate the practical relevance of our approach, we apply it to annotation of life science articles with terms from MeSH (a controlled biomedical vocabulary and thesaurus).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Algorithms and Data Compression
