# An Empirical Analysis of Feature Engineering for Predictive Modeling

**Authors:** Jeff Heaton

arXiv: 1701.07852 · 2020-11-03

## TL;DR

This paper empirically investigates how different types of feature engineering impact the performance of various machine learning models, providing insights into when manual feature creation is necessary.

## Contribution

It offers a systematic analysis of the effectiveness of engineered features across multiple models, guiding better feature engineering practices.

## Key findings

- Models respond differently to various engineered features.
- Engineered features can sometimes be synthesized by models, reducing manual effort.
- Certain feature types significantly improve model performance depending on the model used.

## Abstract

Machine learning models, such as neural networks, decision trees, random forests, and gradient boosting machines, accept a feature vector, and provide a prediction. These models learn in a supervised fashion where we provide feature vectors mapped to the expected output. It is common practice to engineer new features from the provided feature set. Such engineered features will either augment or replace portions of the existing feature vector. These engineered features are essentially calculated fields based on the values of the other features.   Engineering such features is primarily a manual, time-consuming task. Additionally, each type of model will respond differently to different kinds of engineered features. This paper reports empirical research to demonstrate what kinds of engineered features are best suited to various machine learning model types. We provide this recommendation by generating several datasets that we designed to benefit from a particular type of engineered feature. The experiment demonstrates to what degree the machine learning model can synthesize the needed feature on its own. If a model can synthesize a planned feature, it is not necessary to provide that feature. The research demonstrated that the studied models do indeed perform differently with various types of engineered features.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1701.07852/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1701.07852/full.md

## References

24 references — full list in the complete paper: https://tomesphere.com/paper/1701.07852/full.md

---
Source: https://tomesphere.com/paper/1701.07852