OASYS: Domain-Agnostic Automated System for Constructing Knowledge Base from Unstructured Text
Minsang Kim, Sang-hyun Je, Eunjoo Park

TL;DR
OASYS is a domain-agnostic, automated system for constructing knowledge bases from unstructured text, capable of training without human intervention, specifically designed for Korean language data.
Contribution
It introduces OASYS, the first Korean-language system for automated knowledge base construction that trains solely on auto-generated data, and provides a new benchmark dataset.
Findings
System performs well on Korean benchmark dataset
Generated knowledge base is useful for practical applications
Auto-generated training data effectively supports system training
Abstract
In recent years, creating and managing knowledge bases have become crucial to the retail product and enterprise domains. We present an automatic knowledge base construction system that mines data from documents. This system can generate training data during the training process without human intervention. Therefore, it is domain-agnostic trainable using only the target domain text corpus and a pre-defined knowledge base. This system is called OASYS and is the first system built with the Korean language in mind. In addition, we also have constructed a new human-annotated benchmark dataset of the Korean Wikipedia corpus paired with a Korean DBpedia to aid system evaluation. The system performance results on human-annotated benchmark test dataset are meaningful and show that the generated knowledge base from OASYS trained on only auto-generated data is useful. We provide both a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsTest · Balanced Selection
