Tweet2Vec: Learning Tweet Embeddings Using Character-level CNN-LSTM   Encoder-Decoder

Soroush Vosoughi; Prashanth Vijayaraghavan; Deb Roy

arXiv:1607.07514·cs.CL·July 27, 2016·63 cites

Tweet2Vec: Learning Tweet Embeddings Using Character-level CNN-LSTM Encoder-Decoder

Soroush Vosoughi, Prashanth Vijayaraghavan, Deb Roy

PDF

Open Access

TL;DR

Tweet2Vec introduces a character-level CNN-LSTM encoder-decoder that generates versatile tweet embeddings, outperforming previous methods in semantic similarity and sentiment tasks, and can be adapted for multiple languages.

Contribution

The paper presents a novel character-level CNN-LSTM model for tweet embedding that surpasses previous state-of-the-art in key tweet classification tasks.

Findings

01

Outperforms previous state-of-the-art in semantic similarity

02

Outperforms previous state-of-the-art in sentiment categorization

03

Embeddings are versatile and applicable to various tasks

Abstract

We present Tweet2Vec, a novel method for generating general-purpose vector representation of tweets. The model learns tweet embeddings using character-level CNN-LSTM encoder-decoder. We trained our model on 3 million, randomly selected English-language tweets. The model was evaluated using two methods: tweet semantic similarity and tweet sentiment categorization, outperforming the previous state-of-the-art in both tasks. The evaluations demonstrate the power of the tweet embeddings generated by our model for various tweet categorization tasks. The vector representations generated by our model are generic, and hence can be applied to a variety of tasks. Though the model presented in this paper is trained on English-language tweets, the method presented can be used to learn tweet embeddings for different languages.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Sentiment Analysis and Opinion Mining