Controlling the False Split Rate in Tree-Based Aggregation
Simeng Shao, Jacob Bien, Adel Javanmard

TL;DR
This paper introduces a new error measure called the false split rate for tree-based data aggregation and proposes a multiple hypothesis testing method to control this error, demonstrated on financial and geographic data.
Contribution
It defines the false split rate and develops a testing algorithm that controls this error in tree-based aggregation, with applications to mean and regression coefficient aggregation.
Findings
The method effectively controls the false split rate in simulations.
Applied to stock volatility data with successful aggregation.
Used in NYC taxi fare neighborhoods with meaningful results.
Abstract
In many domains, data measurements can naturally be associated with the leaves of a tree, expressing the relationships among these measurements. For example, companies belong to industries, which in turn belong to ever coarser divisions such as sectors; microbes are commonly arranged in a taxonomic hierarchy from species to kingdoms; street blocks belong to neighborhoods, which in turn belong to larger-scale regions. The problem of tree-based aggregation that we consider in this paper asks which of these tree-defined subgroups of leaves should really be treated as a single entity and which of these entities should be distinguished from each other. We introduce the "false split rate", an error measure that describes the degree to which subgroups have been split when they should not have been. We then propose a multiple hypothesis testing algorithm for tree-based aggregation, which we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFinancial Risk and Volatility Modeling · Economic and Environmental Valuation · Advanced Statistical Methods and Models
