TL;DR
This paper introduces a constrained decoding approach for NER taggers that simplifies training and maintains performance, eliminating the need for a CRF layer by enforcing transition constraints during decoding.
Contribution
The authors propose a novel constrained decoding method for NER that speeds up training and matches CRF-based models without requiring complex span encoding schemes.
Findings
Training with constraints is twice as fast as CRF-based models.
Constrained decoding achieves similar F1 scores to CRF models.
Open source implementations are provided in PyTorch and TensorFlow.
Abstract
Current state-of-the-art models for named entity recognition (NER) are neural models with a conditional random field (CRF) as the final layer. Entities are represented as per-token labels with a special structure in order to decode them into spans. Current work eschews prior knowledge of how the span encoding scheme works and relies on the CRF learning which transitions are illegal and which are not to facilitate global coherence. We find that by constraining the output to suppress illegal transitions we can train a tagger with a cross-entropy loss twice as fast as a CRF with differences in F1 that are statistically insignificant, effectively eliminating the need for a CRF. We analyze the dynamics of tag co-occurrence to explain when these constraints are most effective and provide open source implementations of our tagger in both PyTorch and TensorFlow.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConditional Random Field
