Ensembling and Knowledge Distilling of Large Sequence Taggers for   Grammatical Error Correction

Maksym Tarnavskyi; Artem Chernodub; Kostiantyn Omelianchuk

arXiv:2203.13064·cs.CL·March 25, 2022·1 cites

Ensembling and Knowledge Distilling of Large Sequence Taggers for Grammatical Error Correction

Maksym Tarnavskyi, Artem Chernodub, Kostiantyn Omelianchuk

PDF

Open Access 1 Repo

TL;DR

This paper enhances grammatical error correction by ensembling Transformer-based sequence taggers and using knowledge distillation to create synthetic datasets, achieving state-of-the-art results without synthetic pre-training.

Contribution

It introduces a novel ensembling approach for sequence taggers and demonstrates effective knowledge distillation for generating training data, improving GEC performance.

Findings

01

Ensembling models achieves a new SOTA $F_{0.5}$ score of 76.05 on BEA-2019.

02

Knowledge distillation with ensemble-generated data improves single model performance.

03

The best single model achieves an $F_{0.5}$ score of 73.21, close to heavier models.

Abstract

In this paper, we investigate improvements to the GEC sequence tagging architecture with a focus on ensembling of recent cutting-edge Transformer-based encoders in Large configurations. We encourage ensembling models by majority votes on span-level edits because this approach is tolerant to the model architecture and vocabulary size. Our best ensemble achieves a new SOTA result with an $F_{0.5}$ score of 76.05 on BEA-2019 (test), even without pre-training on synthetic datasets. In addition, we perform knowledge distillation with a trained ensemble to generate new synthetic training datasets, "Troy-Blogs" and "Troy-1BW". Our best single sequence tagging model that is pretrained on the generated Troy-datasets in combination with the publicly available synthetic PIE dataset achieves a near-SOTA (To the best of our knowledge, our best single model gives way only to much heavier T5 model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

makstarnavskyi/gector-large
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Softmax · Adafactor · Refunds@Expedia|||How do I get a full refund from Expedia? · SentencePiece · Gated Linear Unit · Dropout