Investigating OCR-Sensitive Neurons to Improve Entity Recognition in   Historical Documents

Emanuela Boros; Maud Ehrmann

arXiv:2409.16934·cs.CL·November 19, 2024

Investigating OCR-Sensitive Neurons to Improve Entity Recognition in Historical Documents

Emanuela Boros, Maud Ehrmann

PDF

Open Access 1 Repo

TL;DR

This study identifies OCR-sensitive neurons in Transformer models and demonstrates that neutralising these neurons enhances named entity recognition accuracy on noisy historical documents.

Contribution

The paper reveals OCR-sensitive neurons in Transformer models and shows that neutralising them improves NER performance on noisy historical texts.

Findings

01

OCR-sensitive neurons exist in Transformer models.

02

Neutralising OCR-sensitive neurons improves NER accuracy.

03

Performance gains observed on historical newspapers and commentaries.

Abstract

This paper investigates the presence of OCR-sensitive neurons within the Transformer architecture and their influence on named entity recognition (NER) performance on historical documents. By analysing neuron activation patterns in response to clean and noisy text inputs, we identify and then neutralise OCR-sensitive neurons to improve model performance. Based on two open access large language models (Llama2 and Mistral), experiments demonstrate the existence of OCR-sensitive regions and show improvements in NER performance on historical newspapers and classical commentaries, highlighting the potential of targeted neuron modulation to improve models' performance on noisy text.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

emanuelaboros/ocr-sensitive-neurons
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Byte Pair Encoding · Absolute Position Encodings · Softmax · Layer Normalization · Dropout · Dense Connections