# Improving short text classification through global augmentation methods

**Authors:** Vukosi Marivate, Tshephisho Sefara

arXiv: 1907.03752 · 2020-12-11

## TL;DR

This paper evaluates various text augmentation techniques for short text classification across social media and news datasets, highlighting Word2vec-based augmentation and mixup as effective methods, while noting the limitations of round-trip translation.

## Contribution

It provides practical insights into the effectiveness of different augmentation methods, especially in resource-constrained scenarios, for improving classification performance.

## Key findings

- Word2vec augmentation is effective without formal synonym models.
- Mixup enhances performance and reduces overfitting.
- Round-trip translation is less accessible due to cost.

## Abstract

We study the effect of different approaches to text augmentation. To do this we use 3 datasets that include social media and formal text in the form of news articles. Our goal is to provide insights for practitioners and researchers on making choices for augmentation for classification use cases. We observe that Word2vec-based augmentation is a viable option when one does not have access to a formal synonym model (like WordNet-based augmentation). The use of \emph{mixup} further improves performance of all text based augmentations and reduces the effects of overfitting on a tested deep learning model. Round-trip translation with a translation service proves to be harder to use due to cost and as such is less accessible for both normal and low resource use-cases.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.03752/full.md

## Figures

16 figures with captions in the complete paper: https://tomesphere.com/paper/1907.03752/full.md

## References

35 references — full list in the complete paper: https://tomesphere.com/paper/1907.03752/full.md

---
Source: https://tomesphere.com/paper/1907.03752