Scalable Pooled Time Series of Big Video Data from the Deep Web
Chris Mattmann, Madhav Sharan

TL;DR
This paper presents a scalable Hadoop-based implementation of the Pooled Time Series algorithm, enabling analysis of large-scale video datasets from the deep web, with applications in human trafficking investigations.
Contribution
It introduces a parallelized, scalable version of Pooled Time Series for large datasets, addressing challenges of processing big video data efficiently.
Findings
The Hadoop-based algorithm performs well on 6800 videos.
The implementation maintains the properties of the original algorithm.
Solutions for issues encountered on large datasets are discussed.
Abstract
We contribute a scalable implementation of Ryoo et al's Pooled Time Series algorithm from CVPR 2015. The updated algorithm has been evaluated on a large and diverse dataset of approximately 6800 videos collected from a crawl of the deep web related to human trafficking on DARPA's MEMEX effort. We describe the properties of Pooled Time Series and the motivation for using it to relate videos collected from the deep web. We highlight issues that we found while running Pooled Time Series on larger datasets and discuss solutions for those issues. Our solution centers are re-imagining Pooled Time Series as a Hadoop-based algorithm in which we compute portions of the eventual solution in parallel on large commodity clusters. We demonstrate that our new Hadoop-based algorithm works well on the 6800 video dataset and shares all of the properties described in the CVPR 2015 paper. We suggest…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Data Visualization and Analytics · Video Analysis and Summarization
