Pretrained Language Models for Sequential Sentence Classification
Arman Cohan, Iz Beltagy, Daniel King, Bhavana Dalvi, Daniel S. Weld

TL;DR
This paper demonstrates that pretrained language models like BERT can effectively classify sequences of sentences within documents, capturing contextual dependencies without hierarchical models or CRFs, and achieves state-of-the-art results.
Contribution
The study shows that BERT-based models can replace hierarchical and CRF-based methods for sequential sentence classification, simplifying the approach while maintaining high performance.
Findings
Achieved state-of-the-art results on four datasets.
Developed a joint sentence representation for BERT.
Validated effectiveness on structured scientific abstracts.
Abstract
As a step toward better document-level understanding, we explore classification of a sequence of sentences into their corresponding categories, a task that requires understanding sentences in context of the document. Recent successful models for this task have used hierarchical models to contextualize sentence representations, and Conditional Random Fields (CRFs) to incorporate dependencies between subsequent labels. In this work, we show that pretrained language models, BERT (Devlin et al., 2018) in particular, can be used for this task to capture contextual dependencies without the need for hierarchical encoding nor a CRF. Specifically, we construct a joint sentence representation that allows BERT Transformer layers to directly utilize contextual information from all words in all sentences. Our approach achieves state-of-the-art results on four datasets, including a new dataset of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Conditional Random Field · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Byte Pair Encoding · Refunds@Expedia|||How do I get a full refund from Expedia?
