Pretrained Language Models for Sequential Sentence Classification

Arman Cohan; Iz Beltagy; Daniel King; Bhavana Dalvi; Daniel S. Weld

arXiv:1909.04054·cs.CL·March 24, 2021

Pretrained Language Models for Sequential Sentence Classification

Arman Cohan, Iz Beltagy, Daniel King, Bhavana Dalvi, Daniel S. Weld

PDF

1 Repo 1 Datasets

TL;DR

This paper demonstrates that pretrained language models like BERT can effectively classify sequences of sentences within documents, capturing contextual dependencies without hierarchical models or CRFs, and achieves state-of-the-art results.

Contribution

The study shows that BERT-based models can replace hierarchical and CRF-based methods for sequential sentence classification, simplifying the approach while maintaining high performance.

Findings

01

Achieved state-of-the-art results on four datasets.

02

Developed a joint sentence representation for BERT.

03

Validated effectiveness on structured scientific abstracts.

Abstract

As a step toward better document-level understanding, we explore classification of a sequence of sentences into their corresponding categories, a task that requires understanding sentences in context of the document. Recent successful models for this task have used hierarchical models to contextualize sentence representations, and Conditional Random Fields (CRFs) to incorporate dependencies between subsequent labels. In this work, we show that pretrained language models, BERT (Devlin et al., 2018) in particular, can be used for this task to capture contextual dependencies without the need for hierarchical encoding nor a CRF. Specifically, we construct a joint sentence representation that allows BERT Transformer layers to directly utilize contextual information from all words in all sentences. Our approach achieves state-of-the-art results on four datasets, including a new dataset of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

allenai/sequential_sentence_classification
pytorchOfficial

Datasets

allenai/csabstruct
dataset· 58 dl
58 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Conditional Random Field · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Byte Pair Encoding · Refunds@Expedia|||How do I get a full refund from Expedia?