# Scalable Panel Fusion Using Distributed Min Cost Flow

**Authors:** Swapnil Shinde, Jukka Ranta, Paul Deitrick, Matthew Malloy

arXiv: 1907.05458 · 2019-07-15

## TL;DR

This paper introduces a scalable, network flow-based method for panel data fusion in digital audience measurement, efficiently handling large datasets through distributed algorithms and providing conditions for optimality.

## Contribution

It formalizes the panel fusion problem as a network flow model and proposes a distributed algorithm to solve large-scale instances efficiently.

## Key findings

- Successfully fused two real-world panel datasets
- Achieved scalable performance with tens of millions of observations
- Provided theoretical conditions for solution optimality

## Abstract

Modern audience measurement requires combining observations from disparate panel datasets. Connecting and relating such panel datasets is a process termed panel fusion. This paper formalizes the panel fusion problem and presents a novel approach to solve it. We cast the panel fusion as a network flow problem, allowing the application of a rich body of research. In the context of digital audience measurement, where panel sizes can grow into the tens of millions, we propose an efficient algorithm to partition the network into sub-problems. While the algorithm solves a relaxed version of the original problem, we provide conditions under which it guarantees optimality. We demonstrate our approach by fusing two real-world panel datasets in a distributed computing environment.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.05458/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/1907.05458/full.md

## References

17 references — full list in the complete paper: https://tomesphere.com/paper/1907.05458/full.md

---
Source: https://tomesphere.com/paper/1907.05458