Make Literature-Based Discovery Great Again through Reproducible Pipelines
Bojan Cestnik, Andrej Kastrin, Boshko Koloski, Nada Lavra\v{c}

TL;DR
This paper introduces a reproducible pipeline for literature-based discovery (LBD) using Jupyter Notebooks, enabling transparent, collaborative, and robust scientific hypothesis generation from scientific literature.
Contribution
It provides a set of open-access notebooks demonstrating traditional and novel bisociative LBD methods, enhancing reproducibility and collaboration in the field.
Findings
Reproducible LBD pipelines improve research transparency.
Open access notebooks facilitate method comparison and reuse.
Ensemble and outlier-based approaches show promising results.
Abstract
By connecting disparate sources of scientific literature, literature\-/based discovery (LBD) methods help to uncover new knowledge and generate new research hypotheses that cannot be found from domain-specific documents alone. Our work focuses on bisociative LBD methods that combine bisociative reasoning with LBD techniques. The paper presents LBD through the lens of reproducible science to ensure the reproducibility of LBD experiments, overcome the inconsistent use of benchmark datasets and methods, trigger collaboration, and advance the LBD field toward more robust and impactful scientific discoveries. The main novelty of this study is a collection of Jupyter Notebooks that illustrate the steps of the bisociative LBD process, including data acquisition, text preprocessing, hypothesis formulation, and evaluation. The contributed notebooks implement a selection of traditional LBD…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Semantic Web and Ontologies
