Graphine: A Dataset for Graph-aware Terminology Definition Generation
Zequn Liu, Shukai Wang, Yiyang Gu, Ruiyi Zhang, Ming Zhang, Sheng Wang

TL;DR
Graphine is a large-scale biomedical terminology dataset with graph structures, enabling the development of graph-aware definition generation models like Graphex, which outperform existing models and facilitate various NLP tasks.
Contribution
The paper introduces Graphine, a comprehensive dataset of over 2 million terminology definitions with graph structures, and proposes Graphex, a novel graph-aware text generation model.
Findings
Graphex outperforms existing text generation models.
Graphine enables evaluation of pretrained language models.
Graphine supports comparison of graph representation learning methods.
Abstract
Precisely defining the terminology is the first step in scientific communication. Developing neural text generation models for definition generation can circumvent the labor-intensity curation, further accelerating scientific discovery. Unfortunately, the lack of large-scale terminology definition dataset hinders the process toward definition generation. In this paper, we present a large-scale terminology definition dataset Graphine covering 2,010,648 terminology definition pairs, spanning 227 biomedical subdisciplines. Terminologies in each subdiscipline further form a directed acyclic graph, opening up new avenues for developing graph-aware text generation models. We then proposed a novel graph-aware definition generation model Graphex that integrates transformer with graph neural network. Our model outperforms existing text generation models by exploiting the graph structure of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Biomedical Text Mining and Ontologies
