Stacked DeBERT: All Attention in Incomplete Data for Text Classification

Gwenaelle Cunha Sergio; Minho Lee

arXiv:2001.00137·cs.CL·January 15, 2021

Stacked DeBERT: All Attention in Incomplete Data for Text Classification

Gwenaelle Cunha Sergio, Minho Lee

PDF

1 Repo

TL;DR

This paper introduces Stacked DeBERT, a novel model that enhances robustness in text classification tasks involving incomplete or noisy data by employing a denoising transformer architecture.

Contribution

The paper proposes a new encoding scheme in BERT using denoising transformers and multilayer perceptrons to better handle incomplete and noisy text data.

Findings

01

Improved F1-scores on benchmark datasets.

02

Enhanced robustness in informal and incorrect texts.

03

Effective reconstruction of missing word embeddings.

Abstract

In this paper, we propose Stacked DeBERT, short for Stacked Denoising Bidirectional Encoder Representations from Transformers. This novel model improves robustness in incomplete data, when compared to existing systems, by designing a novel encoding scheme in BERT, a powerful language representation model solely based on attention mechanisms. Incomplete data in natural language processing refer to text with missing or incorrect words, and its presence can hinder the performance of current models that were not implemented to withstand such noises, but must still perform well even under duress. This is due to the fact that current approaches are built for and trained with clean and complete data, and thus are not able to extract features that can adequately represent incomplete data. Our proposed approach consists of obtaining intermediate input representations by applying an embedding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gcunhase/StackedDeBERT
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Weight Decay · Residual Connection · Adam · Layer Normalization · Softmax · Attention Is All You Need · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention