# Character-based Neural Embeddings for Tweet Clustering

**Authors:** Svitlana Vakulenko, Lyndon Nixon, Mihai Lupu

arXiv: 1703.05123 · 2017-03-17

## TL;DR

This paper introduces a character-based neural network approach to improve tweet clustering, effectively handling multilingual content and vocabulary limitations, with demonstrated performance gains and publicly available code.

## Contribution

It presents a novel character-based neural embedding method specifically designed for tweet clustering, addressing vocabulary explosion and multilingual processing issues.

## Key findings

- Enhanced clustering performance demonstrated on benchmark datasets.
- Effective handling of multilingual tweets without language-specific preprocessing.
- Open-source code available for reproducibility and further research.

## Abstract

In this paper we show how the performance of tweet clustering can be improved by leveraging character-based neural networks. The proposed approach overcomes the limitations related to the vocabulary explosion in the word-based models and allows for the seamless processing of the multilingual content. Our evaluation results and code are available on-line at https://github.com/vendi12/tweet2vec_clustering

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1703.05123/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/1703.05123/full.md

## References

24 references — full list in the complete paper: https://tomesphere.com/paper/1703.05123/full.md

---
Source: https://tomesphere.com/paper/1703.05123