# Subject Specific Stream Classification Preprocessing Algorithm for   Twitter Data Stream

**Authors:** Nisansa de Silva, Danaja Maldeniya, Chamilka Wijeratne

arXiv: 1705.09995 · 2017-05-30

## TL;DR

This paper presents a new preprocessing algorithm for classifying Twitter data streams into specific, mutually exclusive categories to improve the efficiency and relevance of data mining applications.

## Contribution

The paper introduces a subject-specific stream classification algorithm that enhances data mining efficiency on Twitter by accurately categorizing data streams.

## Key findings

- Improved classification accuracy for Twitter data streams
- Enhanced efficiency of data mining processes
- Potential for more relevant sentiment analysis results

## Abstract

Micro-blogging service Twitter is a lucrative source for data mining applications on global sentiment. But due to the omnifariousness of the subjects mentioned in each data item; it is inefficient to run a data mining algorithm on the raw data. This paper discusses an algorithm to accurately classify the entire stream in to a given number of mutually exclusive collectively exhaustive streams upon each of which the data mining algorithm can be run separately yielding more relevant results with a high efficiency.

---
Source: https://tomesphere.com/paper/1705.09995