# What's Wrong with Hebrew NLP? And How to Make it Right

**Authors:** Reut Tsarfaty, Amit Seker, Shoval Sadde, Stav Klein

arXiv: 1908.05453 · 2019-08-16

## TL;DR

This paper introduces Onlp, a joint morpho-syntactic parser for Modern Hebrew that improves accuracy by reducing error propagation, addressing challenges faced by NLP tools in morphologically-rich languages.

## Contribution

The paper presents a novel joint inference framework for Hebrew NLP that enhances accuracy and provides rich output, filling a gap in tools for morphologically-rich languages.

## Key findings

- Onlp achieves high accuracy in Hebrew morphological and syntactic parsing.
- Joint inference reduces error propagation compared to pipeline approaches.
- The tool supports diverse academic and commercial applications.

## Abstract

For languages with simple morphology, such as English, automatic annotation pipelines such as spaCy or Stanford's CoreNLP successfully serve projects in academia and the industry. For many morphologically-rich languages (MRLs), similar pipelines show sub-optimal performance that limits their applicability for text analysis in research and the industry.The sub-optimal performance is mainly due to errors in early morphological disambiguation decisions, which cannot be recovered later in the pipeline, yielding incoherent annotations on the whole. In this paper we describe the design and use of the Onlp suite, a joint morpho-syntactic parsing framework for processing Modern Hebrew texts. The joint inference over morphology and syntax substantially limits error propagation, and leads to high accuracy. Onlp provides rich and expressive output which already serves diverse academic and commercial needs. Its accompanying online demo further serves educational activities, introducing Hebrew NLP intricacies to researchers and non-researchers alike.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1908.05453/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/1908.05453/full.md

## References

20 references — full list in the complete paper: https://tomesphere.com/paper/1908.05453/full.md

---
Source: https://tomesphere.com/paper/1908.05453