# Fast and Accurate Entity Recognition with Iterated Dilated Convolutions

**Authors:** Emma Strubell, Patrick Verga, David Belanger, Andrew McCallum

arXiv: 1702.02098 · 2017-07-25

## TL;DR

This paper introduces Iterated Dilated Convolutional Neural Networks (ID-CNNs) as a fast and accurate alternative to Bi-LSTMs for Named Entity Recognition, achieving significant speedups while maintaining high accuracy.

## Contribution

The paper presents ID-CNNs, a novel convolutional architecture that leverages parallelism and structured training to outperform Bi-LSTMs in speed and accuracy for NER tasks.

## Key findings

- ID-CNNs achieve 14-20x faster test times than Bi-LSTM-CRFs.
- ID-CNNs maintain comparable accuracy to Bi-LSTMs.
- Training ID-CNNs on entire documents improves accuracy further.

## Abstract

Today when many practitioners run basic NLP on the entire web and large-volume traffic, faster methods are paramount to saving time and energy costs. Recent advances in GPU hardware have led to the emergence of bi-directional LSTMs as a standard method for obtaining per-token vector representations serving as input to labeling tasks such as NER (often followed by prediction in a linear-chain CRF). Though expressive and accurate, these models fail to fully exploit GPU parallelism, limiting their computational efficiency. This paper proposes a faster alternative to Bi-LSTMs for NER: Iterated Dilated Convolutional Neural Networks (ID-CNNs), which have better capacity than traditional CNNs for large context and structured prediction. Unlike LSTMs whose sequential processing on sentences of length N requires O(N) time even in the face of parallelism, ID-CNNs permit fixed-depth convolutions to run in parallel across entire documents. We describe a distinct combination of network structure, parameter sharing and training procedures that enable dramatic 14-20x test-time speedups while retaining accuracy comparable to the Bi-LSTM-CRF. Moreover, ID-CNNs trained to aggregate context from the entire document are even more accurate while maintaining 8x faster test time speeds.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1702.02098/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/1702.02098/full.md

## References

49 references — full list in the complete paper: https://tomesphere.com/paper/1702.02098/full.md

---
Source: https://tomesphere.com/paper/1702.02098