DoT: An efficient Double Transformer for NLP tasks with tables

Syrine Krichene; Thomas M\"uller; Julian Martin Eisenschlos

arXiv:2106.00479·cs.CL·June 2, 2021

DoT: An efficient Double Transformer for NLP tasks with tables

Syrine Krichene, Thomas M\"uller, Julian Martin Eisenschlos

PDF

1 Repo

TL;DR

The paper introduces DoT, a double transformer architecture that improves efficiency in NLP tasks with tables by combining a shallow pruning transformer with a deep task-specific transformer, reducing training and inference time significantly.

Contribution

The paper proposes a novel double transformer architecture, DoT, that decomposes NLP tasks with tables into two sub-tasks for enhanced efficiency without substantial accuracy loss.

Findings

01

DoT reduces training and inference time by at least 50%.

02

The pruning transformer effectively selects relevant tokens.

03

DoT maintains similar accuracy to slower baseline models.

Abstract

Transformer-based approaches have been successfully used to obtain state-of-the-art accuracy on natural language processing (NLP) tasks with semi-structured tables. These model architectures are typically deep, resulting in slow training and inference, especially for long inputs. To improve efficiency while maintaining a high accuracy, we propose a new architecture, DoT, a double transformer model, that decomposes the problem into two sub-tasks: A shallow pruning transformer that selects the top-K tokens, followed by a deep task-specific transformer that takes as input those K tokens. Additionally, we modify the task-specific attention to incorporate the pruning scores. The two transformers are jointly trained by optimizing the task-specific loss. We run experiments on three benchmarks, including entailment and question-answering. We show that for a small drop of accuracy, DoT improves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google-research/tapas
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsPruning