TRANS-BLSTM: Transformer with Bidirectional LSTM for Language   Understanding

Zhiheng Huang; Peng Xu; Davis Liang; Ajay Mishra; Bing Xiang

arXiv:2003.07000·cs.CL·March 17, 2020·25 cites

TRANS-BLSTM: Transformer with Bidirectional LSTM for Language Understanding

Zhiheng Huang, Peng Xu, Davis Liang, Ajay Mishra, Bing Xiang

PDF

Open Access

TL;DR

This paper introduces TRANS-BLSTM, a novel architecture combining transformer and bidirectional LSTM to enhance NLP task performance, demonstrating consistent accuracy improvements over BERT baselines on benchmarks like GLUE and SQuAD 1.1.

Contribution

The paper proposes a new TRANS-BLSTM architecture that integrates BLSTM layers into transformer blocks, improving NLP task accuracy over existing models.

Findings

01

TRANS-BLSTM outperforms BERT baselines on GLUE and SQuAD 1.1.

02

Achieves an F1 score of 94.01% on SQuAD 1.1.

03

Demonstrates the effectiveness of combining transformer and BLSTM architectures.

Abstract

Bidirectional Encoder Representations from Transformers (BERT) has recently achieved state-of-the-art performance on a broad range of NLP tasks including sentence classification, machine translation, and question answering. The BERT model architecture is derived primarily from the transformer. Prior to the transformer era, bidirectional Long Short-Term Memory (BLSTM) has been the dominant modeling architecture for neural machine translation and question answering. In this paper, we investigate how these two modeling techniques can be combined to create a more powerful model architecture. We propose a new architecture denoted as Transformer with BLSTM (TRANS-BLSTM) which has a BLSTM layer integrated to each transformer block, leading to a joint modeling framework for transformer and BLSTM. We show that TRANS-BLSTM models consistently lead to improvements in accuracy compared to BERT…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Byte Pair Encoding · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections