Multi-Head Multi-Layer Attention to Deep Language Representations for Grammatical Error Detection
Masahiro Kaneko, Mamoru Komachi

TL;DR
This paper introduces a multi-head multi-layer attention model that leverages multiple layers of pre-trained language models like BERT to improve grammatical error detection, achieving state-of-the-art results across several datasets.
Contribution
It proposes a novel multi-head multi-layer attention approach to utilize intermediate layers of BERT for grammatical error detection, surpassing existing methods.
Findings
Achieved top scores on three grammatical error detection datasets.
Outperformed previous state-of-the-art by significant margins.
Demonstrated broader information utilization through multi-layer attention.
Abstract
It is known that a deep neural network model pre-trained with large-scale data greatly improves the accuracy of various tasks, especially when there are resource constraints. However, the information needed to solve a given task can vary, and simply using the output of the final layer is not necessarily sufficient. Moreover, to our knowledge, exploiting large language representation models to detect grammatical errors has not yet been studied. In this work, we investigate the effect of utilizing information not only from the final layer but also from intermediate layers of a pre-trained language representation model to detect grammatical errors. We propose a multi-head multi-layer attention model that determines the appropriate layers in Bidirectional Encoder Representation from Transformers (BERT). The proposed method achieved the best scores on three datasets for grammatical error…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
