Physical Representation-based Predicate Optimization for a Visual Analytics Database
Michael R. Anderson, Michael Cafarella, German Ros, Thomas F. Wenisch

TL;DR
This paper introduces Tahoma, a method that optimizes both CNN architectures and input data representations, significantly accelerating visual content queries with minimal accuracy loss.
Contribution
Tahoma jointly optimizes CNN models and input transformations, leading to substantial speedups in visual content classification without compromising accuracy.
Findings
Up to 35x speedup from input transformations
Up to 98x speedup over ResNet50 with no accuracy loss
280x speedup with some accuracy trade-off
Abstract
Querying the content of images, video, and other non-textual data sources requires expensive content extraction methods. Modern extraction techniques are based on deep convolutional neural networks (CNNs) and can classify objects within images with astounding accuracy. Unfortunately, these methods are slow: processing a single image can take about 10 milliseconds on modern GPU-based hardware. As massive video libraries become ubiquitous, running a content-based query over millions of video frames is prohibitive. One promising approach to reduce the runtime cost of queries of visual content is to use a hierarchical model, such as a cascade, where simple cases are handled by an inexpensive classifier. Prior work has sought to design cascades that optimize the computational cost of inference by, for example, using smaller CNNs. However, we observe that there are critical factors besides…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
