BERT for Long Documents: A Case Study of Automated ICD Coding

Arash Afkanpour; Shabir Adeel; Hansenclever Bassani; Arkady Epshteyn,; Hongbo Fan; Isaac Jones; Mahan Malihi; Adrian Nauth; Raj Sinha; Sanjana; Woonna; Shiva Zamani; Elli Kanal; Mikhail Fomitchev; Donny Cheung

arXiv:2211.02519·cs.CL·November 7, 2022·1 cites

BERT for Long Documents: A Case Study of Automated ICD Coding

Arash Afkanpour, Shabir Adeel, Hansenclever Bassani, Arkady Epshteyn,, Hongbo Fan, Isaac Jones, Mahan Malihi, Adrian Nauth, Raj Sinha, Sanjana, Woonna, Shiva Zamani, Elli Kanal, Mikhail Fomitchev, Donny Cheung

PDF

Open Access

TL;DR

This paper introduces a scalable method for applying transformer models like BERT to long documents, significantly improving automated ICD coding and surpassing CNN-based methods.

Contribution

The paper proposes a simple, scalable approach to adapt BERT for long texts, demonstrating improved performance in ICD coding tasks over prior transformer-based studies.

Findings

01

Transformer-based models outperform CNNs in ICD coding.

02

The proposed method significantly improves transformer performance.

03

BERT-based models achieve state-of-the-art results in long document processing.

Abstract

Transformer models have achieved great success across many NLP problems. However, previous studies in automated ICD coding concluded that these models fail to outperform some of the earlier solutions such as CNN-based models. In this paper we challenge this conclusion. We present a simple and scalable method to process long text with the existing transformer models such as BERT. We show that this method significantly improves the previous results reported for transformer models in ICD coding, and is able to outperform one of the prominent CNN-based methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Music and Audio Processing

MethodsAttention Is All You Need · fail · Layer Normalization · Residual Connection · Dropout · Softmax · WordPiece · Linear Warmup With Linear Decay · Weight Decay · Attention Dropout