TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer   Sentence Selection

Siddhant Garg; Thuy Vu; Alessandro Moschitti

arXiv:1911.04118·cs.CL·November 21, 2019

TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection

Siddhant Garg, Thuy Vu, Alessandro Moschitti

PDF

2 Repos 1 Datasets

TL;DR

TANDA is a two-step fine-tuning approach for pre-trained Transformer models that improves answer sentence selection accuracy and robustness, outperforming previous methods on benchmark datasets and demonstrating industrial applicability.

Contribution

The paper introduces TANDA, a novel transfer and adaptation fine-tuning method for Transformer models that enhances performance and robustness in answer sentence selection tasks.

Findings

01

Achieved state-of-the-art MAP scores of 92% on WikiQA and 94.3% on TREC-QA.

02

TANDA produces more stable and robust models with less hyper-parameter tuning effort.

03

Effective in noisy datasets and industrial domain applications.

Abstract

We propose TANDA, an effective technique for fine-tuning pre-trained Transformer models for natural language tasks. Specifically, we first transfer a pre-trained model into a model for a general task by fine-tuning it with a large and high-quality dataset. We then perform a second fine-tuning step to adapt the transferred model to the target domain. We demonstrate the benefits of our approach for answer sentence selection, which is a well-known inference task in Question Answering. We built a large scale dataset to enable the transfer step, exploiting the Natural Questions dataset. Our approach establishes the state of the art on two well-known benchmarks, WikiQA and TREC-QA, achieving MAP scores of 92% and 94.3%, respectively, which largely outperform the previous highest scores of 83.4% and 87.5%, obtained in very recent work. We empirically show that TANDA generates more stable and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

AmazonScience/asnq
dataset· 984 dl
984 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax