Amalgamations in a hierarchy as a way of variable selection in compositional data analysis
Michael Greenacre, Martin Graeve

TL;DR
This paper introduces a hierarchical amalgamation approach for variable selection in compositional data analysis, utilizing domain knowledge to create meaningful subsets and assessing their explanatory power through logratio variance.
Contribution
It proposes a novel hierarchical amalgamation method for variable selection in compositional data, demonstrated on fatty acid data from marine organisms.
Findings
Hierarchical amalgamations explain significant logratio variance.
Method provides an alternative to traditional variable selection.
Application to marine fatty acids illustrates practical utility.
Abstract
In certain fields where compositional data are studied, the compositional components, called parts, can be combined into certain subsets, called amalgamations, that are based on domain knowledge. Furthermore, these subsets can form a natural hierarchy of amalgamations subdividing into sub-amalgamations. The authors, a statistician and a biochemist, demonstrate how to create a hierarchy of amalgamations in the context of fatty acid compositions in a sample of marine organisms. Following a tradition in compositional data analysis, these amalgamations are transformed to logratios, and their usefulness as new variables is quantified by the percentage of total logratio variance that they explain. This method is proposed as an alternative method of variable selection in compositional data analysis.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeochemistry and Geologic Mapping · Geological and Geochemical Analysis · Hydrocarbon exploration and reservoir analysis
