A text autoencoder from transformer for fast encoding language   representation

Tan Huang

arXiv:2111.02844·cs.CL·November 5, 2021

A text autoencoder from transformer for fast encoding language representation

Tan Huang

PDF

Open Access

TL;DR

This paper introduces a fast, resource-efficient transformer-based autoencoder for language representation that reduces computational complexity and improves performance on classification and semantic similarity tasks.

Contribution

It proposes a novel deep bidirectional language model with window masking, achieving O(n) complexity and superior performance compared to traditional BERT-like models.

Findings

01

Higher accuracy in SMS classification using CPU-based embeddings

02

Significantly better performance in semantic similarity tasks

03

Reduced computational complexity from O(n^2) to O(n)

Abstract

In recent years BERT shows apparent advantages and great potential in natural language processing tasks. However, both training and applying BERT requires intensive time and resources for computing contextual language representations, which hinders its universality and applicability. To overcome this bottleneck, we propose a deep bidirectional language model by using window masking mechanism at attention layer. This work computes contextual language representations without random masking as does in BERT and maintains the deep bidirectional architecture like BERT. To compute the same sentence representation, our method shows O(n) complexity less compared to other transformer-based models with O( $n^{2}$ ). To further demonstrate its superiority, computing context language representations on CPU environments is conducted, by using the embeddings from the proposed method, logistic regression…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · WordPiece · Layer Normalization · Residual Connection · Dense Connections · Attention Dropout · Softmax