A Unified Knowledge Graph Augmentation Service for Boosting Domain-specific NLP Tasks
Ruiqing Ding, Xiao Han, Leye Wang

TL;DR
KnowledgeDA is a unified service that enhances domain-specific NLP tasks by automatically augmenting training data with domain knowledge graphs, improving model performance across healthcare and software development domains.
Contribution
It introduces a novel, unified framework for injecting domain knowledge into PLMs during fine-tuning using knowledge graphs and data augmentation techniques.
Findings
Improves domain-specific text classification accuracy.
Enhances QA task performance in healthcare and software domains.
Demonstrates generalizability across different NLP tasks.
Abstract
By focusing the pre-training process on domain-specific corpora, some domain-specific pre-trained language models (PLMs) have achieved state-of-the-art results. However, it is under-investigated to design a unified paradigm to inject domain knowledge in the PLM fine-tuning stage. We propose KnowledgeDA, a unified domain language model development service to enhance the task-specific training procedure with domain knowledge graphs. Given domain-specific task texts input, KnowledgeDA can automatically generate a domain-specific language model following three steps: (i) localize domain knowledge entities in texts via an embedding-similarity approach; (ii) generate augmented samples by retrieving replaceable domain entity pairs from two views of both knowledge graph and training data; (iii) select high-quality augmented samples for fine-tuning via confidence-based assessment. We implement a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
Methodstravel james
