Evaluation of Representation Models for Text Classification with AutoML Tools
Sebastian Br\"andle, Marc Hanussek, Matthias Blohm, and Maximilien, Kintz

TL;DR
This paper benchmarks different text representation methods and AutoML tools for text classification, finding that manual representations outperform auto-generated embeddings across multiple datasets.
Contribution
It provides a comparative analysis of manual and auto-generated text representations using open-source AutoML tools for text classification.
Findings
Manual text representations outperform AutoML-generated embeddings.
AutoML tools struggle with unstructured text data.
Benchmark includes four AutoML tools and eight datasets.
Abstract
Automated Machine Learning (AutoML) has gained increasing success on tabular data in recent years. However, processing unstructured data like text is a challenge and not widely supported by open-source AutoML tools. This work compares three manually created text representations and text embeddings automatically created by AutoML tools. Our benchmark includes four popular open-source AutoML tools and eight datasets for text classification purposes. The results show that straightforward text representations perform better than AutoML tools with automatically created text embeddings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
