# The NP-hard problem of computing the maximal sample variance over   interval data is solvable in almost linear time with high probability

**Authors:** Miroslav Rada, Michal \v{C}ern\'y, Ond\v{r}ej Sokol

arXiv: 1905.07821 · 2022-07-28

## TL;DR

This paper improves an algorithm for computing the maximal sample variance over interval data, showing it runs in near-linear average time with high probability despite the problem's NP-hardness, and provides probabilistic bounds on worst-case instances.

## Contribution

It introduces a more efficient version of an existing algorithm and analyzes its average-case complexity under a probabilistic data model, demonstrating near-linear expected runtime.

## Key findings

- Algorithm's time complexity improved to $O(n 	ext{log} n + n 2^omega)$
- Average complexity is $O(n^{1+epsilon})$ under the probabilistic model
- Rare occurrence of instances requiring exponential time, with probability decaying as $e^{-n 	ext{log} 	ext{log} n}$

## Abstract

We consider the algorithm by Ferson et al. (Reliable computing 11(3), p. 207-233, 2005) designed for solving the NP-hard problem of computing the maximal sample variance over interval data, motivated by robust statistics (in fact, the formulation can be written as a nonconvex quadratic program with a specific structure). First, we propose a new version of the algorithm improving its original time bound $O(n^2 2^\omega)$ to $O(n \log n+n\cdot 2^\omega)$, where $n$ is number of input data and $\omega$ is the clique number in a certain intersection graph. Then we treat input data as random variables as it is usual in statistics) and introduce a natural probabilistic data generating model. We get $2^\omega = O(n^{1/\log\log n})$ and $\omega = O(\log n / \log\log n)$ on average. This results in average computing time $O(n^{1+\epsilon})$ for $\epsilon > 0$ arbitrarily small, which may be considered as "surprisingly good" average time complexity for solving an NP-hard problem. Moreover, we prove the following tail bound on the distribution of computation time: hard instances, forcing the algorithm to compute in time $2^{\Omega(n)}$, occur rarely, with probability tending to zero at the rate $e^{-n\log\log n}$.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.07821/full.md

## References

15 references — full list in the complete paper: https://tomesphere.com/paper/1905.07821/full.md

---
Source: https://tomesphere.com/paper/1905.07821