ESIE-BERT: Enriching Sub-words Information Explicitly with BERT for Joint Intent Classification and SlotFilling
Yu Guo, Zhilong Xie, Xingyan Chen, Huangen Chen, Leilei Wang, Huaming, Du, Shaopeng Wei, Yu Zhao, Qing Li, Gang Wu

TL;DR
This paper introduces ESIE-BERT, a novel method that explicitly models sub-word features and sentence-level intent information to improve joint intent classification and slot filling in natural language understanding tasks.
Contribution
It proposes a sub-words attention adapter and an intent attention adapter to better utilize sub-word and sentence features in BERT-based models.
Findings
Significant improvement in slot filling F1 score on ATIS dataset (from 96.1 to 98.2)
Enhanced model performance on two benchmark datasets
Addresses sub-word mismatch issue in BERT for NLU tasks
Abstract
Natural language understanding (NLU) has two core tasks: intent classification and slot filling. The success of pre-training language models resulted in a significant breakthrough in the two tasks. One of the promising solutions called BERT can jointly optimize the two tasks. We note that BERT-based models convert each complex token into multiple sub-tokens by wordpiece algorithm, which generates a mismatch between the lengths of the tokens and the labels. This leads to BERT-based models do not do well in label prediction which limits model performance improvement. Many existing models can be compatible with this issue but some hidden semantic information is discarded in the fine-tuning process. We address the problem by introducing a novel joint method on top of BERT which explicitly models the multiple sub-tokens features after wordpiece tokenization, thereby contributing to the two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsEmirates Airlines Office in Dubai · Multi-Head Attention · Attention Is All You Need · Adapter · Linear Layer · Weight Decay · Residual Connection · Dense Connections · Layer Normalization · Linear Warmup With Linear Decay
