SupCL-Seq: Supervised Contrastive Learning for Downstream Optimized Sequence Representations
Hooman Sedghamiz, Shivam Raval, Enrico Santus, Tuka Alhanai, Mohammad, Ghassemi

TL;DR
SupCL-Seq introduces supervised contrastive learning to NLP, enhancing sequence representations by augmenting views through dropout and significantly improving performance on GLUE tasks over standard BERT models.
Contribution
It adapts supervised contrastive learning from vision to NLP, demonstrating improved sequence representations and downstream task performance.
Findings
Large gains on GLUE benchmark tasks, e.g., 6% on CoLA.
Consistent improvements over self-supervised contrastive methods.
Enhancement is due to downstream optimized representations, not just augmentation.
Abstract
While contrastive learning is proven to be an effective training strategy in computer vision, Natural Language Processing (NLP) is only recently adopting it as a self-supervised alternative to Masked Language Modeling (MLM) for improving sequence representations. This paper introduces SupCL-Seq, which extends the supervised contrastive learning from computer vision to the optimization of sequence representations in NLP. By altering the dropout mask probability in standard Transformer architectures, for every representation (anchor), we generate augmented altered views. A supervised contrastive loss is then utilized to maximize the system's capability of pulling together similar samples (e.g., anchors and their altered views) and pushing apart the samples belonging to the other classes. Despite its simplicity, SupCLSeq leads to large gains in many sequence classification tasks on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsAttention Is All You Need · Linear Layer · Contrastive Learning · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Layer Normalization · Label Smoothing · Adam · Residual Connection · Multi-Head Attention
