PubSqueezer: A Text-Mining Web Tool to Transform Unstructured Documents into Structured Data
Alberto Calderone

TL;DR
PubSqueezer is a web tool that uses text mining to convert unstructured biomedical literature into structured data, facilitating exploration, connection, and analysis of scientific papers for insights and discoveries.
Contribution
It introduces a novel web-based text mining tool that transforms large collections of biomedical articles into structured data for easier analysis and insight generation.
Findings
Built a rare diseases network from literature data
Analyzed SARS-CoV-2 literature to extract known facts
Enabled integration of scientific literature into computational analyses
Abstract
The amount of scientific papers published every day is daunting and constantly increasing. Keeping up with literature represents a challenge. If one wants to start exploring new topics it is hard to have a big picture without reading lots of articles. Furthermore, as one reads through literature, making mental connections is crucial to ask new questions which might lead to discoveries. In this work, I present a web tool which uses a Text Mining strategy to transform large collections of unstructured biomedical articles into structured data. Generated results give a quick overview on complex topics which can possibly suggest not explicitly reported information. In particular, I show two Data Science analyses. First, I present a literature based rare diseases network build using this tool in the hope that it will help clarify some aspects of these less popular pathologies. Secondly, I…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBioinformatics and Genomic Networks · Biomedical Text Mining and Ontologies · Machine Learning in Bioinformatics
