Incremental Processing in the Age of Non-Incremental Encoders: An   Empirical Assessment of Bidirectional Models for Incremental NLU

Brielen Madureira; David Schlangen

arXiv:2010.05330·cs.CL·March 29, 2024

Incremental Processing in the Age of Non-Incremental Encoders: An Empirical Assessment of Bidirectional Models for Incremental NLU

Brielen Madureira, David Schlangen

PDF

1 Repo

TL;DR

This paper empirically assesses how bidirectional models like BERT and Transformers perform in incremental natural language understanding tasks, exploring methods to adapt them for real-time, partial input processing.

Contribution

It demonstrates that bidirectional encoders can be effectively used incrementally with minimal performance loss, and proposes training and testing adaptations to improve their incremental capabilities.

Findings

01

Bidirectional models retain most non-incremental quality when used incrementally.

02

BERT's performance is more affected by incremental access compared to other models.

03

Training and testing modifications can mitigate performance drops in incremental settings.

Abstract

While humans process language incrementally, the best language encoders currently used in NLP do not. Both bidirectional LSTMs and Transformers assume that the sequence that is to be encoded is available in full, to be processed either forwards and backwards (BiLSTMs) or as a whole (Transformers). We investigate how they behave under incremental interfaces, when partial output must be provided based on partial input seen up to a certain time step, which may happen in interactive systems. We test five models on various NLU datasets and compare their performance using three incremental evaluation metrics. The results support the possibility of using bidirectional encoders in incremental mode while retaining most of their non-incremental quality. The "omni-directional" BERT model, which achieves better non-incremental performance, is impacted more by the incremental access. This can be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

briemadu/inc-bidirectional
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Cosine Annealing · WordPiece · Adam · Byte Pair Encoding · Softmax · Multi-Head Attention · Layer Normalization · Dense Connections · Linear Warmup With Cosine Annealing