An Empirical Survey of Unsupervised Text Representation Methods on   Twitter Data

Lili Wang; Chongyang Gao; Jason Wei; Weicheng Ma; Ruibo Liu; Soroush; Vosoughi

arXiv:2012.03468·cs.CL·December 8, 2020

An Empirical Survey of Unsupervised Text Representation Methods on Twitter Data

Lili Wang, Chongyang Gao, Jason Wei, Weicheng Ma, Ruibo Liu, Soroush, Vosoughi

PDF

TL;DR

This paper empirically evaluates various unsupervised text representation methods on Twitter data, revealing that advanced models do not always outperform simpler ones in noisy, user-generated text clustering tasks.

Contribution

It provides a comprehensive experimental comparison of text representation techniques specifically on Twitter data, highlighting the need for further research in this area.

Findings

01

Advanced models do not always outperform simpler ones on Twitter data

02

Noisy user-generated text poses challenges for existing representation methods

03

Further exploration is needed for effective text representations in social media contexts

Abstract

The field of NLP has seen unprecedented achievements in recent years. Most notably, with the advent of large-scale pre-trained Transformer-based language models, such as BERT, there has been a noticeable improvement in text representation. It is, however, unclear whether these improvements translate to noisy user-generated text, such as tweets. In this paper, we present an experimental survey of a wide range of well-known text representation techniques for the task of text clustering on noisy Twitter data. Our results indicate that the more advanced models do not necessarily work best on tweets and that more exploration in this area is needed.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Linear Warmup With Linear Decay · WordPiece · Residual Connection · Multi-Head Attention · Adam · Dense Connections · Refunds@Expedia|||How do I get a full refund from Expedia? · Weight Decay · Dropout