TRANS-BLSTM: Transformer with Bidirectional LSTM for Language Understanding
Zhiheng Huang, Peng Xu, Davis Liang, Ajay Mishra, Bing Xiang

TL;DR
This paper introduces TRANS-BLSTM, a novel architecture combining transformer and bidirectional LSTM to enhance NLP task performance, demonstrating consistent accuracy improvements over BERT baselines on benchmarks like GLUE and SQuAD 1.1.
Contribution
The paper proposes a new TRANS-BLSTM architecture that integrates BLSTM layers into transformer blocks, improving NLP task accuracy over existing models.
Findings
TRANS-BLSTM outperforms BERT baselines on GLUE and SQuAD 1.1.
Achieves an F1 score of 94.01% on SQuAD 1.1.
Demonstrates the effectiveness of combining transformer and BLSTM architectures.
Abstract
Bidirectional Encoder Representations from Transformers (BERT) has recently achieved state-of-the-art performance on a broad range of NLP tasks including sentence classification, machine translation, and question answering. The BERT model architecture is derived primarily from the transformer. Prior to the transformer era, bidirectional Long Short-Term Memory (BLSTM) has been the dominant modeling architecture for neural machine translation and question answering. In this paper, we investigate how these two modeling techniques can be combined to create a more powerful model architecture. We propose a new architecture denoted as Transformer with BLSTM (TRANS-BLSTM) which has a BLSTM layer integrated to each transformer block, leading to a joint modeling framework for transformer and BLSTM. We show that TRANS-BLSTM models consistently lead to improvements in accuracy compared to BERT…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Byte Pair Encoding · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections
