Method and Dataset Entity Mining in Scientific Literature: A CNN + Bi-LSTM Model with Self-attention
Linlin Hou, Ji Zhang, Ou Wu, Ting Yu, Zhen Wang, Zhao Li, Jianliang, Gao, Yingchun Ye, Rujing Yao

TL;DR
This paper introduces MDER, a novel CNN + Bi-LSTM with self-attention model for extracting method and dataset entities from scientific literature, enhancing domain analysis and recommendation systems.
Contribution
The paper presents a new entity recognition model that effectively extracts method and dataset information from scientific texts using rule embedding and deep learning techniques.
Findings
Model performs well across four computer science domains.
Data augmentation improves model robustness with limited data.
Modules within the model collectively enhance recognition accuracy.
Abstract
Literature analysis facilitates researchers to acquire a good understanding of the development of science and technology. The traditional literature analysis focuses largely on the literature metadata such as topics, authors, abstracts, keywords, references, etc., and little attention was paid to the main content of papers. In many scientific domains such as science, computing, engineering, etc., the methods and datasets involved in the scientific papers published in those domains carry important information and are quite useful for domain analysis as well as algorithm and dataset recommendation. In this paper, we propose a novel entity recognition model, called MDER, which is able to effectively extract the method and dataset entities from the main textual content of scientific papers. The model utilizes rule embedding and adopts a parallel structure of CNN and Bi-LSTM with the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Topic Modeling · Data Quality and Management
