# Hierarchical Summarization of Metric Changes

**Authors:** Matthias Ruhl, Mukund Sundararajan, Qiqi Yan

arXiv: 1703.07795 · 2017-03-24

## TL;DR

This paper introduces an algorithm for hierarchical metric change summarization across multiple dimensions, optimizing the identification of key data segments responsible for metric shifts, with applications demonstrated in advertising campaign analysis.

## Contribution

The paper presents a novel hierarchical summarization algorithm that is optimal for two dimensions and provides approximation guarantees for higher dimensions, including practical application insights.

## Key findings

- Algorithm is optimal for 2D cases.
- Provides a $	ext{log}^{d-2}(n+1)$ approximation for $d 	extgreater 2$.
- In Adwords, the algorithm achieves a 2-approximation.

## Abstract

We study changes in metrics that are defined on a cartesian product of trees. Such metrics occur naturally in many practical applications, where a global metric (such as revenue) can be broken down along several hierarchical dimensions (such as location, gender, etc).   Given a change in such a metric, our goal is to identify a small set of non-overlapping data segments that account for the change. An organization interested in improving the metric can then focus their attention on these data segments.   Our key contribution is an algorithm that mimics the operation of a hierarchical organization of analysts. The algorithm has been successfully applied, for example within Google Adwords to help advertisers triage the performance of their advertising campaigns.   We show that the algorithm is optimal for two dimensions, and has an approximation ratio $\log^{d-2}(n+1)$ for $d \geq 3$ dimensions, where $n$ is the number of input data segments. For the Adwords application, we can show that our algorithm is in fact a $2$-approximation.   Mathematically, we identify a certain data pattern called a \emph{conflict} that both guides the design of the algorithm, and plays a central role in the hardness results. We use these conflicts to both derive a lower bound of $1.144^{d-2}$ (again $d\geq3$) for our algorithm, and to show that the problem is NP-hard, justifying the focus on approximation.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1703.07795/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1703.07795/full.md

## References

33 references — full list in the complete paper: https://tomesphere.com/paper/1703.07795/full.md

---
Source: https://tomesphere.com/paper/1703.07795