# Effect fusion using model-based clustering

**Authors:** Gertraud Malsiner-Walli, Daniela Pauger, Helga Wagner

arXiv: 1703.07603 · 2017-03-23

## TL;DR

This paper introduces a Bayesian model-based clustering method for effect fusion in categorical variables, enabling automatic grouping of levels with similar effects to improve regression analysis.

## Contribution

It proposes a novel Bayesian prior that encourages effect fusion and uses model-based clustering during MCMC to identify similar effect levels and irrelevant variables.

## Key findings

- Effective in simulation studies for grouping effects
- Identifies non-influential variables automatically
- Improves effect estimation accuracy in high-dimensional data

## Abstract

In social and economic studies many of the collected variables are measured on a nominal scale, often with a large number of categories. The definition of categories is usually not unambiguous and different classification schemes using either a finer or a coarser grid are possible. Categorisation has an impact when such a variable is included as covariate in a regression model: a too fine grid will result in imprecise estimates of the corresponding effects, whereas with a too coarse grid important effects will be missed, resulting in biased effect estimates and poor predictive performance.   To achieve automatic grouping of levels with essentially the same effect, we adopt a Bayesian approach and specify the prior on the level effects as a location mixture of spiky normal components. Fusion of level effects is induced by a prior on the mixture weights which encourages empty components. Model-based clustering of the effects during MCMC sampling allows to simultaneously detect categories which have essentially the same effect size and identify variables with no effect at all. The properties of this approach are investigated in simulation studies. Finally, the method is applied to analyse effects of high-dimensional categorical predictors on income in Austria.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1703.07603/full.md

## Figures

28 figures with captions in the complete paper: https://tomesphere.com/paper/1703.07603/full.md

## References

39 references — full list in the complete paper: https://tomesphere.com/paper/1703.07603/full.md

---
Source: https://tomesphere.com/paper/1703.07603