Multi-View Active Learning for Short Text Classification in User-Generated Data
Payam Karisani, Negin Karisani, Li Xiong

TL;DR
This paper introduces a novel multi-view active learning approach tailored for short, informal user-generated texts, effectively addressing data scarcity and noise in tasks like disaster report detection.
Contribution
It is the first to apply multi-view active learning with Parzen-Rosenblatt integration and query-by-committee strategies to short, noisy user-generated data.
Findings
Outperforms existing models on Twitter datasets
Demonstrates high consistency across different applications
Effectively handles noisy, short texts in active learning
Abstract
Mining user-generated data often suffers from the lack of enough labeled data, short document lengths, and the informal user language. In this paper, we propose a novel active learning model to overcome these obstacles in the tasks tailored for query phrases--e.g., detecting positive reports of natural disasters. Our model has three novelties: 1) It is the first approach to employ multi-view active learning in this domain. 2) It uses the Parzen-Rosenblatt window method to integrate the representativeness measure into multi-view active learning. 3) It employs a query-by-committee strategy, based on the agreement between predictors, to address the usually noisy language of the documents in this domain. We evaluate our model in four publicly available Twitter datasets with distinctly different applications. We also compare our model with a wide range of baselines including those with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Machine Learning and Algorithms · Topic Modeling
MethodsBalanced Selection
