Improving Semantic Matching through Dependency-Enhanced Pre-trained Model with Adaptive Fusion
Jian Song, Di Liang, Rumei Li, Yuntao Li, Sirui Wang, Minlong Peng,, Wei Wu, Yongxin Yu

TL;DR
This paper introduces DAFA, a novel method that explicitly incorporates dependency structure into pre-trained models like BERT, enhancing semantic matching performance by adaptively fusing structural and semantic information.
Contribution
The paper proposes a new dependency-enhanced attention mechanism with adaptive fusion, improving semantic matching by integrating dependency prior knowledge into pre-trained models.
Findings
Achieves state-of-the-art results on 10 datasets.
Demonstrates the effectiveness of dependency-aware attention.
Provides better interpretability of semantic matching models.
Abstract
Transformer-based pre-trained models like BERT have achieved great progress on Semantic Sentence Matching. Meanwhile, dependency prior knowledge has also shown general benefits in multiple NLP tasks. However, how to efficiently integrate dependency prior structure into pre-trained models to better model complex semantic matching relations is still unsettled. In this paper, we propose the \textbf{D}ependency-Enhanced \textbf{A}daptive \textbf{F}usion \textbf{A}ttention (\textbf{DAFA}), which explicitly introduces dependency structure into pre-trained models and adaptively fuses it with semantic information. Specifically, \textbf{\emph{(i)}} DAFA first proposes a structure-sensitive paradigm to construct a dependency matrix for calibrating attention weights. It adopts an adaptive fusion module to integrate the obtained dependency information and the original semantic signals. Moreover,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)
MethodsMulti-Head Attention · Attention Is All You Need · WordPiece · Layer Normalization · Residual Connection · Dropout · Softmax · Adam · Attention Dropout · Linear Warmup With Linear Decay
