TL;DR
This paper investigates the importance of Repetitive Lengthening Form (RLF) in sentiment analysis, introduces a large dataset, and proposes a tuning framework to enhance language models' understanding and explainability of RLF.
Contribution
It presents the first multi-domain RLF dataset, a novel instruction tuning framework, and a method to quantify language models' understanding of informal expressions.
Findings
RLF sentences are expressive and indicative of sentiment.
Fine-tuned models outperform zero-shot GPT-4 in performance.
ExpInstruct improves open-source LLMs to match GPT-4 in performance and explainability.
Abstract
Individuals engaging in online communication frequently express personal opinions with informal styles (e.g., memes and emojis). While Language Models (LMs) with informal communications have been widely discussed, a unique and emphatic style, the Repetitive Lengthening Form (RLF), has been overlooked for years. In this paper, we explore answers to two research questions: 1) Is RLF important for sentiment analysis (SA)? 2) Can LMs understand RLF? Inspired by previous linguistic research, we curate \textbf{Lengthening}, the first multi-domain dataset with 850k samples focused on RLF for SA. Moreover, we introduce \textbf{Exp}lainable \textbf{Instruct}ion Tuning (\textbf{ExpInstruct}), a two-stage instruction tuning framework aimed to improve both performance and explainability of LLMs for RLF. We further propose a novel unified approach to quantify LMs' understanding of informal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
