An automated domain-independent text reading, interpreting and extracting approach for reviewing the scientific literature
Amauri J Paula

TL;DR
This paper introduces a machine learning-based NLP approach, a.RIX, that automatically extracts key categorical and numerical data from scientific articles across fields without needing text annotation, streamlining literature review processes.
Contribution
The novel a.RIX system combines multiple ML models to extract parameters from scientific texts without POS tagging or NER, applicable across various scientific domains.
Findings
Successfully extracted parameters from 7,873 articles on natural products
Operates without text annotation or supervised training
Potential to replace manual article review processes
Abstract
It is presented here a machine learning-based (ML) natural language processing (NLP) approach capable to automatically recognize and extract categorical and numerical parameters from a corpus of articles. The approach (named a.RIX) operates with a concomitant/interchangeable use of ML models such as neuron networks (NNs), latent semantic analysis (LSA), naive-Bayes classifiers (NBC), and a pattern recognition model using regular expression (REGEX). A corpus of 7,873 scientific articles dealing with natural products (NPs) was used to demonstrate the efficiency of the a.RIX engine. The engine automatically extracts categorical and numerical parameters such as (i) the plant species from which active molecules are extracted, (ii) the microorganisms species for which active molecules can act against, and (iii) the values of minimum inhibitory concentration (MIC) against these microorganisms.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Advanced Text Analysis Techniques · Topic Modeling
