TL;DR
Spark NLP is a scalable, high-performance NLP library built on Apache Spark, offering extensive pre-trained models and supporting nearly all NLP tasks for enterprise use across multiple languages.
Contribution
It introduces a comprehensive, scalable NLP library with 1100 pre-trained pipelines, supporting multi-language tasks in a distributed environment, widely adopted in industry.
Findings
Over 2.7 million downloads indicating high adoption
Supports 192 languages and all major NLP tasks
Widely used in healthcare organizations
Abstract
Spark NLP is a Natural Language Processing (NLP) library built on top of Apache Spark ML. It provides simple, performant and accurate NLP annotations for machine learning pipelines that can scale easily in a distributed environment. Spark NLP comes with 1100 pre trained pipelines and models in more than 192 languages. It supports nearly all the NLP tasks and modules that can be used seamlessly in a cluster. Downloaded more than 2.7 million times and experiencing nine times growth since January 2020, Spark NLP is used by 54% of healthcare organizations as the worlds most widely used NLP library in the enterprise.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
