Multi-View Active Learning for Short Text Classification in   User-Generated Data

Payam Karisani; Negin Karisani; Li Xiong

arXiv:2112.02611·cs.CL·December 22, 2022·1 cites

Multi-View Active Learning for Short Text Classification in User-Generated Data

Payam Karisani, Negin Karisani, Li Xiong

PDF

Open Access

TL;DR

This paper introduces a novel multi-view active learning approach tailored for short, informal user-generated texts, effectively addressing data scarcity and noise in tasks like disaster report detection.

Contribution

It is the first to apply multi-view active learning with Parzen-Rosenblatt integration and query-by-committee strategies to short, noisy user-generated data.

Findings

01

Outperforms existing models on Twitter datasets

02

Demonstrates high consistency across different applications

03

Effectively handles noisy, short texts in active learning

Abstract

Mining user-generated data often suffers from the lack of enough labeled data, short document lengths, and the informal user language. In this paper, we propose a novel active learning model to overcome these obstacles in the tasks tailored for query phrases--e.g., detecting positive reports of natural disasters. Our model has three novelties: 1) It is the first approach to employ multi-view active learning in this domain. 2) It uses the Parzen-Rosenblatt window method to integrate the representativeness measure into multi-view active learning. 3) It employs a query-by-committee strategy, based on the agreement between predictors, to address the usually noisy language of the documents in this domain. We evaluate our model in four publicly available Twitter datasets with distinctly different applications. We also compare our model with a wide range of baselines including those with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Machine Learning and Algorithms · Topic Modeling

MethodsBalanced Selection