Large-scale investigation of weakly-supervised deep learning for the fine-grained semantic indexing of biomedical literature
Anastasios Nentidis, Thomas Chatzopoulos, Anastasia Krithara,, Grigorios Tsoumakas, Georgios Paliouras

TL;DR
This study explores weakly-supervised deep learning techniques to improve the fine-grained semantic indexing of biomedical literature, focusing on refining MeSH concept annotations using large-scale data and heuristic methods.
Contribution
It introduces a new deep learning-based method for refining MeSH concept annotations, leveraging weak supervision and heuristic enhancements in biomedical literature indexing.
Findings
Concept occurrence heuristic achieves macro-F1 of 0.63
Proposed method improves heuristic performance by over 4 percentage points
Weak supervision effectively refines coarse-grained biomedical labels
Abstract
Objective: Semantic indexing of biomedical literature is usually done at the level of MeSH descriptors with several related but distinct biomedical concepts often grouped together and treated as a single topic. This study proposes a new method for the automated refinement of subject annotations at the level of MeSH concepts. Methods: Lacking labelled data, we rely on weak supervision based on concept occurrence in the abstract of an article, which is also enhanced by dictionary-based heuristics. In addition, we investigate deep learning approaches, making design choices to tackle the particular challenges of this task. The new method is evaluated on a large-scale retrospective scenario, based on concepts that have been promoted to descriptors. Results: In our experiments concept occurrence was the strongest heuristic achieving a macro-F1 score of about 0.63 across several labels. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Advanced Text Analysis Techniques
