Methods for Generating Drift in Text Streams

Cristiano Mesquita Garcia; Alessandro Lameiras Koerich; Alceu de Souza; Britto Jr; Jean Paul Barddal

arXiv:2403.12328·cs.LG·March 20, 2024·1 cites

Methods for Generating Drift in Text Streams

Cristiano Mesquita Garcia, Alessandro Lameiras Koerich, Alceu de Souza, Britto Jr, Jean Paul Barddal

PDF

Open Access

TL;DR

This paper introduces four methods to generate labeled concept drifts in textual data streams, facilitating the creation of benchmark datasets for evaluating drift detection and adaptation in machine learning models.

Contribution

It proposes novel textual drift generation techniques and evaluates their effectiveness on real-world datasets using incremental classifiers.

Findings

01

All methods cause performance degradation after drifts.

02

Incremental SVM recovers fastest in accuracy and Macro F1-Score.

03

Methods help in benchmarking drift detection in text streams.

Abstract

Systems and individuals produce data continuously. On the Internet, people share their knowledge, sentiments, and opinions, provide reviews about services and products, and so on. Automatically learning from these textual data can provide insights to organizations and institutions, thus preventing financial impacts, for example. To learn from textual data over time, the machine learning system must account for concept drift. Concept drift is a frequent phenomenon in real-world datasets and corresponds to changes in data distribution over time. For instance, a concept drift occurs when sentiments change or a word's meaning is adjusted over time. Although concept drift is frequent in real-world applications, benchmark datasets with labeled drifts are rare in the literature. To bridge this gap, this paper provides four textual drift generation methods to ease the production of datasets…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Database Systems and Queries

MethodsSupport Vector Machine