Towards Incremental Transformers: An Empirical Analysis of Transformer   Models for Incremental NLU

Patrick Kahardipraja; Brielen Madureira; David Schlangen

arXiv:2109.07364·cs.CL·May 3, 2024

Towards Incremental Transformers: An Empirical Analysis of Transformer Models for Incremental NLU

Patrick Kahardipraja, Brielen Madureira, David Schlangen

PDF

Open Access 1 Repo

TL;DR

This paper investigates the use of Linear Transformers for incremental natural language understanding, demonstrating improved efficiency and incremental performance over standard Transformers, with trade-offs in full-sequence accuracy.

Contribution

It provides an empirical analysis of Linear Transformers for incremental NLU, showing their advantages and how training strategies can improve partial output quality.

Findings

01

Linear Transformers outperform standard Transformers in incremental tasks.

02

Recurrent Linear Transformers offer faster inference and better incremental performance.

03

Training with input prefixes improves partial output accuracy.

Abstract

Incremental processing allows interactive systems to respond based on partial inputs, which is a desirable property e.g. in dialogue agents. The currently popular Transformer architecture inherently processes sequences as a whole, abstracting away the notion of time. Recent work attempts to apply Transformers incrementally via restart-incrementality by repeatedly feeding, to an unchanged model, increasingly longer input prefixes to produce partial outputs. However, this approach is computationally costly and does not scale efficiently for long sequences. In parallel, we witness efforts to make Transformers more efficient, e.g. the Linear Transformer (LT) with a recurrence mechanism. In this work, we examine the feasibility of LT for incremental NLU in English. Our results show that the recurrent LT model has better incremental performance and faster inference speed compared to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pkhdipraja/towards-incremental-transformers
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems

MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Layer Normalization · Label Smoothing · Adam · Residual Connection · Multi-Head Attention · Softmax