Efficient Classification of Multi-Labelled Text Streams by Clashing
Ricardo \~Nanculef, Ilias Flaounas, Nello Cristianini

TL;DR
This paper introduces a novel online method called Clashing for classifying multi-labelled text streams efficiently, using constant memory and processing time, and demonstrates its competitive performance on real-world data.
Contribution
The paper presents a new online classification approach that maps text into a low-dimensional space and partitions it for multi-label prediction, suitable for infinite data streams.
Findings
Achieves competitive F measures with less computational resources.
Outperforms similar methods in macro-averaged F measure.
Learns faster from partially labelled streams.
Abstract
We present a method for the classification of multi-labelled text documents explicitly designed for data stream applications that require to process a virtually infinite sequence of data using constant memory and constant processing time. Our method is composed of an online procedure used to efficiently map text into a low-dimensional feature space and a partition of this space into a set of regions for which the system extracts and keeps statistics used to predict multi-label text annotations. Documents are fed into the system as a sequence of words, mapped to a region of the partition, and annotated using the statistics computed from the labelled instances colliding in the same region. This approach is referred to as clashing. We illustrate the method in real-world text data, comparing the results with those obtained using other text classifiers. In addition, we provide an analysis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
