NTULM: Enriching Social Media Text Representations with Non-Textual Units
Jinning Li, Shubhanshu Mishra, Ahmed El-Kishky, Sneha Mehta, and Vivek Kulkarni

TL;DR
This paper introduces a method to incorporate non-textual units like hashtags and mentions into social media text representations, significantly improving NLP task performance by enriching context beyond raw text.
Contribution
The study constructs an NTU-centric social heterogeneous network and fine-tunes a language model with NTU embeddings, enhancing social media NLP by leveraging social context.
Findings
NTU-augmented representations outperform text-only baselines by 2-5%
Including NTU context in initial layers yields better results
Enables holistic social media content embeddings
Abstract
On social media, additional context is often present in the form of annotations and meta-data such as the post's author, mentions, Hashtags, and hyperlinks. We refer to these annotations as Non-Textual Units (NTUs). We posit that NTUs provide social context beyond their textual semantics and leveraging these units can enrich social media text representations. In this work we construct an NTU-centric social heterogeneous network to co-embed NTUs. We then principally integrate these NTU embeddings into a large pretrained language model by fine-tuning with these additional units. This adds context to noisy short-text social media. Experiments show that utilizing NTU-augmented text representations significantly outperforms existing text-only baselines by 2-5\% relative points on many downstream tasks highlighting the importance of context to social media NLP. We also highlight that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Mental Health via Writing · Sentiment Analysis and Opinion Mining
