MeLT: Message-Level Transformer with Masked Document Representations as Pre-Training for Stance Detection
Matthew Matero, Nikita Soni, Niranjan Balasubramanian, and H. Andrew, Schwartz

TL;DR
MeLT is a hierarchical message-level transformer pre-trained on Twitter data that improves stance detection by modeling sequences of messages and reconstructing message vectors, achieving notable F1 scores.
Contribution
Introduces MeLT, a novel message-level transformer pre-trained with masked message vector reconstruction for stance detection in social media.
Findings
Achieves 67% F1 score on stance detection.
Effective modeling of message sequences improves attribute prediction.
Pre-training with masked message vectors enhances downstream task performance.
Abstract
Much of natural language processing is focused on leveraging large capacity language models, typically trained over single messages with a task of predicting one or more tokens. However, modeling human language at higher-levels of context (i.e., sequences of messages) is under-explored. In stance detection and other social media tasks where the goal is to predict an attribute of a message, we have contextual data that is loosely semantically connected by authorship. Here, we introduce Message-Level Transformer (MeLT) -- a hierarchical message-encoder pre-trained over Twitter and applied to the task of stance prediction. We focus on stance prediction as a task benefiting from knowing the context of the message (i.e., the sequence of previous messages). The model is trained using a variant of masked-language modeling; where instead of predicting tokens, it seeks to generate an entire…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Authorship Attribution and Profiling · Misinformation and Its Impacts
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing · Dropout · Softmax · Byte Pair Encoding · Layer Normalization
