TL;DR
The paper introduces DoT, a double transformer architecture that improves efficiency in NLP tasks with tables by combining a shallow pruning transformer with a deep task-specific transformer, reducing training and inference time significantly.
Contribution
The paper proposes a novel double transformer architecture, DoT, that decomposes NLP tasks with tables into two sub-tasks for enhanced efficiency without substantial accuracy loss.
Findings
DoT reduces training and inference time by at least 50%.
The pruning transformer effectively selects relevant tokens.
DoT maintains similar accuracy to slower baseline models.
Abstract
Transformer-based approaches have been successfully used to obtain state-of-the-art accuracy on natural language processing (NLP) tasks with semi-structured tables. These model architectures are typically deep, resulting in slow training and inference, especially for long inputs. To improve efficiency while maintaining a high accuracy, we propose a new architecture, DoT, a double transformer model, that decomposes the problem into two sub-tasks: A shallow pruning transformer that selects the top-K tokens, followed by a deep task-specific transformer that takes as input those K tokens. Additionally, we modify the task-specific attention to incorporate the pruning scores. The two transformers are jointly trained by optimizing the task-specific loss. We run experiments on three benchmarks, including entailment and question-answering. We show that for a small drop of accuracy, DoT improves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsPruning
