TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection
Siddhant Garg, Thuy Vu, Alessandro Moschitti

TL;DR
TANDA is a two-step fine-tuning approach for pre-trained Transformer models that improves answer sentence selection accuracy and robustness, outperforming previous methods on benchmark datasets and demonstrating industrial applicability.
Contribution
The paper introduces TANDA, a novel transfer and adaptation fine-tuning method for Transformer models that enhances performance and robustness in answer sentence selection tasks.
Findings
Achieved state-of-the-art MAP scores of 92% on WikiQA and 94.3% on TREC-QA.
TANDA produces more stable and robust models with less hyper-parameter tuning effort.
Effective in noisy datasets and industrial domain applications.
Abstract
We propose TANDA, an effective technique for fine-tuning pre-trained Transformer models for natural language tasks. Specifically, we first transfer a pre-trained model into a model for a general task by fine-tuning it with a large and high-quality dataset. We then perform a second fine-tuning step to adapt the transferred model to the target domain. We demonstrate the benefits of our approach for answer sentence selection, which is a well-known inference task in Question Answering. We built a large scale dataset to enable the transfer step, exploiting the Natural Questions dataset. Our approach establishes the state of the art on two well-known benchmarks, WikiQA and TREC-QA, achieving MAP scores of 92% and 94.3%, respectively, which largely outperform the previous highest scores of 83.4% and 87.5%, obtained in very recent work. We empirically show that TANDA generates more stable and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax
