A Survey of Task-Based Machine Learning Content Extraction Services for VIDINT
Joshua Brunk, Nathan Jermann, Ryan Sharp, Carl D. Hoover

TL;DR
This survey compares current commercial video content extraction tools, highlighting advancements in machine learning tasks like transcription, translation, tagging, and recognition that enable scalable video analysis.
Contribution
It provides a comprehensive comparison of video content extraction services, emphasizing recent technological progress and their capabilities for organizations to analyze video data effectively.
Findings
Machine learning tasks in video analytics are rapidly improving in speed and accuracy.
Commercial tools vary significantly in capabilities and performance.
The survey highlights key trends and gaps in current video content extraction services.
Abstract
This paper provides a comparison of current video content extraction tools with a focus on comparing commercial task-based machine learning services. Video intelligence (VIDINT) data has become a critical intelligence source in the past decade. The need for AI-based analytics and automation tools to extract and structure content from video has quickly become a priority for organizations needing to search, analyze and exploit video at scale. With rapid growth in machine learning technology, the maturity of machine transcription, machine translation, topic tagging, and object recognition tasks are improving at an exponential rate, breaking performance records in speed and accuracy as new applications evolve. Each section of this paper reviews and compares products, software resources and video analytics capabilities based on tasks relevant to extracting information from video with machine…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Video Analysis and Summarization · Machine Learning and Data Classification
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
