To BERT or Not to BERT: Comparing Task-specific and Task-agnostic   Semi-Supervised Approaches for Sequence Tagging

Kasturi Bhattacharjee; Miguel Ballesteros; Rishita Anubhai; Smaranda; Muresan; Jie Ma; Faisal Ladhak; Yaser Al-Onaizan

arXiv:2010.14042·cs.CL·October 28, 2020

To BERT or Not to BERT: Comparing Task-specific and Task-agnostic Semi-Supervised Approaches for Sequence Tagging

Kasturi Bhattacharjee, Miguel Ballesteros, Rishita Anubhai, Smaranda, Muresan, Jie Ma, Faisal Ladhak, Yaser Al-Onaizan

PDF

TL;DR

This paper compares task-specific semi-supervised methods like Cross-View Training with task-agnostic BERT for sequence tagging, showing that lighter models can achieve similar performance with less cost and environmental impact.

Contribution

It demonstrates that task-specific semi-supervised approaches can match BERT's performance in sequence tagging while being more efficient and environmentally friendly.

Findings

01

CVT achieves comparable accuracy to BERT on sequence tagging tasks.

02

Lighter models like CVT reduce financial and environmental costs.

03

Task-specific semi-supervised methods are viable alternatives to BERT.

Abstract

Leveraging large amounts of unlabeled data using Transformer-like architectures, like BERT, has gained popularity in recent times owing to their effectiveness in learning general representations that can then be further fine-tuned for downstream tasks to much success. However, training these models can be costly both from an economic and environmental standpoint. In this work, we investigate how to effectively use unlabeled data: by exploring the task-specific semi-supervised approach, Cross-View Training (CVT) and comparing it with task-agnostic BERT in multiple settings that include domain and task relevant English data. CVT uses a much lighter model architecture and we show that it achieves similar performance to BERT on a set of sequence tagging tasks, with lesser financial and environmental impact.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Sigmoid Activation · Long Short-Term Memory · Bidirectional LSTM · Convolution · Tanh Activation · [LivE@PeRson]How do I talk to a real person at Expedia? · CNN Bidirectional LSTM · Layer Normalization · Softmax