Spark NLP: Natural Language Understanding at Scale

Veysel Kocaman; David Talby

arXiv:2101.10848·cs.CL·January 27, 2021

Spark NLP: Natural Language Understanding at Scale

Veysel Kocaman, David Talby

PDF

1 Repo

TL;DR

Spark NLP is a scalable, high-performance NLP library built on Apache Spark, offering extensive pre-trained models and supporting nearly all NLP tasks for enterprise use across multiple languages.

Contribution

It introduces a comprehensive, scalable NLP library with 1100 pre-trained pipelines, supporting multi-language tasks in a distributed environment, widely adopted in industry.

Findings

01

Over 2.7 million downloads indicating high adoption

02

Supports 192 languages and all major NLP tasks

03

Widely used in healthcare organizations

Abstract

Spark NLP is a Natural Language Processing (NLP) library built on top of Apache Spark ML. It provides simple, performant and accurate NLP annotations for machine learning pipelines that can scale easily in a distributed environment. Spark NLP comes with 1100 pre trained pipelines and models in more than 192 languages. It supports nearly all the NLP tasks and modules that can be used seamlessly in a cluster. Downloaded more than 2.7 million times and experiencing nine times growth since January 2020, Spark NLP is used by 54% of healthcare organizations as the worlds most widely used NLP library in the enterprise.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

JohnSnowLabs/spark-nlp
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.