# ScispaCy: Fast and Robust Models for Biomedical Natural Language   Processing

**Authors:** Mark Neumann, Daniel King, Iz Beltagy, Waleed Ammar

arXiv: 1902.07669 · 2021-03-24

## TL;DR

ScispaCy is a new tool built on spaCy that provides fast, robust models specifically designed for biomedical and scientific text processing, addressing domain shift challenges.

## Contribution

The paper introduces scispaCy, a practical and publicly available toolkit with models tailored for biomedical NLP, improving robustness and performance in this domain.

## Key findings

- Models demonstrate strong robustness across multiple datasets
- ScispaCy significantly outperforms general NLP models in biomedical tasks
- The toolkit is fast, practical, and publicly accessible

## Abstract

Despite recent advances in natural language processing, many statistical models for processing text perform extremely poorly under domain shift. Processing biomedical and clinical text is a critically important application area of natural language processing, for which there are few robust, practical, publicly available models. This paper describes scispaCy, a new tool for practical biomedical/scientific text processing, which heavily leverages the spaCy library. We detail the performance of two packages of models released in scispaCy and demonstrate their robustness on several tasks and datasets. Models and code are available at https://allenai.github.io/scispacy/

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.07669/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/1902.07669/full.md

## References

43 references — full list in the complete paper: https://tomesphere.com/paper/1902.07669/full.md

---
Source: https://tomesphere.com/paper/1902.07669