Using Person Embedding to Enrich Features and Data Augmentation for Classification
Ahmet Tu\u{g}rul Bayrak

TL;DR
This paper explores using customer embeddings to enhance feature representation and data augmentation in classification tasks, specifically for fraud detection, leading to improved model performance.
Contribution
It introduces a novel approach of applying customer embeddings for feature enrichment and re-labeling to boost classification accuracy in imbalanced datasets.
Findings
Customer embedding improves classification success
Re-labeling similar customers increases positive samples
Embedding-based features outperform traditional features
Abstract
Today, machine learning is applied in almost any field. In machine learning, where there are numerous methods, classification is one of the most basic and crucial ones. Various problems can be solved by classification. The feature selection for model setup is extremely important, and producing new features via feature engineering also has a vital place in the success of the model. In our study, fraud detection classification models are built on a labeled and imbalanced dataset as a case-study. Although it is a natural language processing method, a customer space has been created with word embedding, which has been used in different areas, especially for recommender systems. The customer vectors in the created space are fed to the classification model as a feature. Moreover, to increase the number of positive labels, rows with similar characteristics are re-labeled as positive by using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Text and Document Classification Technologies · Imbalanced Data Classification Techniques
MethodsFeature Selection
