Feature engineering vs. deep learning for paper section identification: Toward applications in Chinese medical literature
Sijia Zhou, Xin Li

TL;DR
This paper compares traditional machine learning and deep learning approaches for identifying sections in Chinese medical literature, proposing a novel SLSTM model that outperforms existing methods with nearly 90% accuracy.
Contribution
It introduces the Structural Bidirectional LSTM (SLSTM) model for Chinese literature section identification, demonstrating its superiority over traditional and other deep learning methods.
Findings
CRFs outperform basic features with classic ML algorithms.
Deep learning models are less effective than traditional ML for this task.
The SLSTM model achieves nearly 90% precision and recall.
Abstract
Section identification is an important task for library science, especially knowledge management. Identifying the sections of a paper would help filter noise in entity and relation extraction. In this research, we studied the paper section identification problem in the context of Chinese medical literature analysis, where the subjects, methods, and results are more valuable from a physician's perspective. Based on previous studies on English literature section identification, we experiment with the effective features to use with classic machine learning algorithms to tackle the problem. It is found that Conditional Random Fields, which consider sentence interdependency, is more effective in combining different feature sets, such as bag-of-words, part-of-speech, and headings, for Chinese literature section identification. Moreover, we find that classic machine learning algorithms are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLib
