A New Data Representation Based on Training Data Characteristics to Extract Drug Named-Entity in Medical Text
Sadikin Mujiono, Mohamad Ivan Fanany, Chan Basaruddin

TL;DR
This paper introduces three novel data representation techniques based on word distribution and similarity for drug name recognition in medical texts, significantly improving F-score performance over existing methods.
Contribution
It proposes new data representation methods tailored to medical text characteristics, enhancing drug entity extraction accuracy in unstructured and evolving medical corpora.
Findings
The LSTM-based sequence representation achieved an F-score of 0.8645.
The new techniques outperform previous state-of-the-art methods.
Deep learning models benefit from the proposed data representations.
Abstract
One essential task in information extraction from the medical corpus is drug name recognition. Compared with text sources come from other domains, the medical text is special and has unique characteristics. In addition, the medical text mining poses more challenges, e.g., more unstructured text, the fast growing of new terms addition, a wide range of name variation for the same drug. The mining is even more challenging due to the lack of labeled dataset sources and external knowledge, as well as multiple token representations for a single drug name that is more common in the real application setting. Although many approaches have been proposed to overwhelm the task, some problems remained with poor F-score performance (less than 0.75). This paper presents a new treatment in data representation techniques to overcome some of those challenges. We propose three data representation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Advanced Text Analysis Techniques
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
