Tabular Learning: Encoding for Entity and Context Embeddings

Fredy Reusser

arXiv:2403.19405·cs.LG·March 29, 2024·2 cites

Tabular Learning: Encoding for Entity and Context Embeddings

Fredy Reusser

PDF

Open Access

TL;DR

This paper evaluates various encoding techniques for entity and context embeddings in tabular learning, demonstrating that string similarity-based encoding outperforms ordinal encoding, especially with transformer architectures in multi-label classification.

Contribution

It introduces a benchmark comparing encoding methods for categorical data, highlighting the superiority of string similarity encoding over ordinal encoding in tabular learning tasks.

Findings

01

String similarity encoding improves classification accuracy.

02

Transformers perform better with similarity-based encodings.

03

Ordinal encoding is less effective for categorical data.

Abstract

Examining the effect of different encoding techniques on entity and context embeddings, the goal of this work is to challenge commonly used Ordinal encoding for tabular learning. Applying different preprocessing methods and network architectures over several datasets resulted in a benchmark on how the encoders influence the learning outcome of the networks. By keeping the test, validation and training data consistent, results have shown that ordinal encoding is not the most suited encoder for categorical data in terms of preprocessing the data and thereafter, classifying the target variable correctly. A better outcome was achieved, encoding the features based on string similarities by computing a similarity matrix as input for the network. This is the case for both, entity and context embeddings, where the transformer architecture showed improved performance for Ordinal and Similarity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Video Analysis and Summarization