Tree-aggregated regression for compositional data with measurement errors
Zhenghan Li, Tianying Wang

TL;DR
This paper introduces TARCO, a novel tree-aggregated regression method that corrects for measurement errors in high-dimensional compositional data, improving estimation accuracy and interpretability in applications like microbiome studies.
Contribution
The paper develops a new convex optimization approach that accounts for hierarchical measurement error interactions in tree-aggregated compositional regression, with theoretical guarantees and scalable algorithms.
Findings
TARCO outperforms existing methods in simulation studies.
The method achieves better support recovery and interpretability.
Application to microbiome data demonstrates practical advantages.
Abstract
High-dimensional compositional covariates, often derived from count data, are subject to measurement error and are frequently analyzed after aggregation along a prespecified tree to improve interpretability in applications such as microbiome studies. Existing approaches typically handle either tree-guided compositional regression or errors-in-variables correction, but they do not account for the hierarchical contamination induced by their interaction. We show that tree aggregation turns leaf-level measurement error into level-dependent, correlated contamination across aggregated nodes, which inflates bias, weakens concentration rates for corrected estimating quantities, and leads to unstable variable selection for naive approaches. We propose Tree-Aggregated Regression with Correction for Observation Error (TARCO), which integrates bias-corrected estimating quantities with a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
