RiverText: A Python Library for Training and Evaluating Incremental Word Embeddings from Text Data Streams
Gabriel Iturra-Bocaz, Felipe Bravo-Marquez

TL;DR
RiverText is an open-source Python library that enables training and evaluating incremental word embeddings from streaming text data, facilitating adaptation to evolving language patterns in real-time scenarios like social media analysis.
Contribution
The paper introduces RiverText, a standardized framework for incremental word embeddings, integrating multiple techniques and adapting evaluation tasks for streaming data.
Findings
Incremental embeddings outperform static models in dynamic language scenarios.
Hyperparameter tuning significantly impacts embedding quality.
RiverText effectively processes real-time text streams for NLP tasks.
Abstract
Word embeddings have become essential components in various information retrieval and natural language processing tasks, such as ranking, document classification, and question answering. However, despite their widespread use, traditional word embedding models present a limitation in their static nature, which hampers their ability to adapt to the constantly evolving language patterns that emerge in sources such as social media and the web (e.g., new hashtags or brand names). To overcome this problem, incremental word embedding algorithms are introduced, capable of dynamically updating word representations in response to new language patterns and processing continuous data streams. This paper presents RiverText, a Python library for training and evaluating incremental word embeddings from text data streams. Our tool is a resource for the information retrieval and natural language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
