Evaluation of Representation Models for Text Classification with AutoML   Tools

Sebastian Br\"andle; Marc Hanussek; Matthias Blohm; and Maximilien; Kintz

arXiv:2106.12798·cs.CL·July 8, 2021

Evaluation of Representation Models for Text Classification with AutoML Tools

Sebastian Br\"andle, Marc Hanussek, Matthias Blohm, and Maximilien, Kintz

PDF

TL;DR

This paper benchmarks different text representation methods and AutoML tools for text classification, finding that manual representations outperform auto-generated embeddings across multiple datasets.

Contribution

It provides a comparative analysis of manual and auto-generated text representations using open-source AutoML tools for text classification.

Findings

01

Manual text representations outperform AutoML-generated embeddings.

02

AutoML tools struggle with unstructured text data.

03

Benchmark includes four AutoML tools and eight datasets.

Abstract

Automated Machine Learning (AutoML) has gained increasing success on tabular data in recent years. However, processing unstructured data like text is a challenge and not widely supported by open-source AutoML tools. This work compares three manually created text representations and text embeddings automatically created by AutoML tools. Our benchmark includes four popular open-source AutoML tools and eight datasets for text classification purposes. The results show that straightforward text representations perform better than AutoML tools with automatically created text embeddings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.