Comprehensive and Efficient Distillation for Lightweight Sentiment Analysis Models

Guangyu Xie; Yice Zhang; Jianzhu Bao; Qianlong Wang; Yang Sun; Bingbing Wang; Ruifeng Xu

arXiv:2510.24425·cs.CL·November 4, 2025

Comprehensive and Efficient Distillation for Lightweight Sentiment Analysis Models

Guangyu Xie, Yice Zhang, Jianzhu Bao, Qianlong Wang, Yang Sun, Bingbing Wang, Ruifeng Xu

PDF

3 Models 1 Datasets 1 Video

TL;DR

This paper introduces CompEffDist, a novel distillation framework for sentiment analysis that improves data efficiency and model performance by automating instruction creation and filtering large-scale data.

Contribution

It proposes attribute-based automatic instruction construction and difficulty-based data filtering to enhance knowledge distillation for lightweight sentiment models.

Findings

01

3B models match 20x larger teachers' performance

02

Achieves same accuracy with only 10% of data

03

Outperforms baseline methods in efficiency

Abstract

Recent efforts leverage knowledge distillation techniques to develop lightweight and practical sentiment analysis models. These methods are grounded in human-written instructions and large-scale user texts. Despite the promising results, two key challenges remain: (1) manually written instructions are limited in diversity and quantity, making them insufficient to ensure comprehensive coverage of distilled knowledge; (2) large-scale user texts incur high computational cost, hindering the practicality of these methods. To this end, we introduce CompEffDist, a comprehensive and efficient distillation framework for sentiment analysis. Our framework consists of two key modules: attribute-based automatic instruction construction and difficulty-based data filtering, which correspondingly tackle the aforementioned challenges. Applying our method across multiple model series (Llama-3, Qwen-3,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

zhang-yice/sentiment-distillation-v2
dataset· 7 dl
7 dl

Videos

Comprehensive and Efficient Distillation for Lightweight Sentiment Analysis Models· underline