SupCL-Seq: Supervised Contrastive Learning for Downstream Optimized   Sequence Representations

Hooman Sedghamiz; Shivam Raval; Enrico Santus; Tuka Alhanai; Mohammad; Ghassemi

arXiv:2109.07424·cs.CL·September 16, 2021

SupCL-Seq: Supervised Contrastive Learning for Downstream Optimized Sequence Representations

Hooman Sedghamiz, Shivam Raval, Enrico Santus, Tuka Alhanai, Mohammad, Ghassemi

PDF

Open Access 1 Repo

TL;DR

SupCL-Seq introduces supervised contrastive learning to NLP, enhancing sequence representations by augmenting views through dropout and significantly improving performance on GLUE tasks over standard BERT models.

Contribution

It adapts supervised contrastive learning from vision to NLP, demonstrating improved sequence representations and downstream task performance.

Findings

01

Large gains on GLUE benchmark tasks, e.g., 6% on CoLA.

02

Consistent improvements over self-supervised contrastive methods.

03

Enhancement is due to downstream optimized representations, not just augmentation.

Abstract

While contrastive learning is proven to be an effective training strategy in computer vision, Natural Language Processing (NLP) is only recently adopting it as a self-supervised alternative to Masked Language Modeling (MLM) for improving sequence representations. This paper introduces SupCL-Seq, which extends the supervised contrastive learning from computer vision to the optimization of sequence representations in NLP. By altering the dropout mask probability in standard Transformer architectures, for every representation (anchor), we generate augmented altered views. A supervised contrastive loss is then utilized to maximize the system's capability of pulling together similar samples (e.g., anchors and their altered views) and pushing apart the samples belonging to the other classes. Despite its simplicity, SupCLSeq leads to large gains in many sequence classification tasks on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hooman650/supcl-seq
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsAttention Is All You Need · Linear Layer · Contrastive Learning · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Layer Normalization · Label Smoothing · Adam · Residual Connection · Multi-Head Attention