Using Person Embedding to Enrich Features and Data Augmentation for   Classification

Ahmet Tu\u{g}rul Bayrak

arXiv:2206.15162·cs.LG·July 1, 2022

Using Person Embedding to Enrich Features and Data Augmentation for Classification

Ahmet Tu\u{g}rul Bayrak

PDF

Open Access

TL;DR

This paper explores using customer embeddings to enhance feature representation and data augmentation in classification tasks, specifically for fraud detection, leading to improved model performance.

Contribution

It introduces a novel approach of applying customer embeddings for feature enrichment and re-labeling to boost classification accuracy in imbalanced datasets.

Findings

01

Customer embedding improves classification success

02

Re-labeling similar customers increases positive samples

03

Embedding-based features outperform traditional features

Abstract

Today, machine learning is applied in almost any field. In machine learning, where there are numerous methods, classification is one of the most basic and crucial ones. Various problems can be solved by classification. The feature selection for model setup is extremely important, and producing new features via feature engineering also has a vital place in the success of the model. In our study, fraud detection classification models are built on a labeled and imbalanced dataset as a case-study. Although it is a natural language processing method, a customer space has been created with word embedding, which has been used in different areas, especially for recommender systems. The customer vectors in the created space are fed to the classification model as a feature. Moreover, to increase the number of positive labels, rows with similar characteristics are re-labeled as positive by using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpam and Phishing Detection · Text and Document Classification Technologies · Imbalanced Data Classification Techniques

MethodsFeature Selection