Three approaches to supervised learning for compositional data with pairwise logratios
Germa Coenders, Michael Greenacre

TL;DR
This paper introduces three supervised learning methods for selecting pairwise logratios in compositional data analysis, balancing prediction accuracy and interpretability, demonstrated through a Crohn's disease dataset.
Contribution
It proposes three novel stepwise supervised methods for selecting pairwise logratios tailored to different research needs, enhancing interpretability and predictive performance.
Findings
Method 1 achieves highest predictive accuracy.
Method 2 offers intuitive interpretability.
Method 3 identifies subcompositions with high explanatory power.
Abstract
The common approach to compositional data analysis is to transform the data by means of logratios. Logratios between pairs of compositional parts (pairwise logratios) are the easiest to interpret in many research problems. When the number of parts is large, some form of logratio selection is a must, for instance by means of an unsupervised learning method based on a stepwise selection of the pairwise logratios that explain the largest percentage of the logratio variance in the compositional dataset. In this article we present three alternative stepwise supervised learning methods to select the pairwise logratios that best explain a dependent variable in a generalized linear model, each geared for a specific problem. The first method features unrestricted search, where any pairwise logratio can be selected. This method has a complex interpretation if some pairs of parts in the logratios…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeochemistry and Geologic Mapping · Hydrocarbon exploration and reservoir analysis
