Human-Machine Collaborative Video Coding Through Cuboidal Partitioning
Ashek Ahmmed, Manoranjan Paul, Manzur Murshed, and David Taubman

TL;DR
This paper introduces a novel video coding framework using cuboidal regions to efficiently encode critical information for machine vision tasks, achieving better detection accuracy and reduced bit rate compared to traditional methods.
Contribution
It proposes leveraging cuboidal feature descriptors for joint human and machine vision, improving object detection and communication efficiency in video coding.
Findings
Cuboidal features improve object detection precision.
The method reduces bit rate by 7%.
Experimental results validate the approach's effectiveness.
Abstract
Video coding algorithms encode and decode an entire video frame while feature coding techniques only preserve and communicate the most critical information needed for a given application. This is because video coding targets human perception, while feature coding aims for machine vision tasks. Recently, attempts are being made to bridge the gap between these two domains. In this work, we propose a video coding framework by leveraging on to the commonality that exists between human vision and machine vision applications using cuboids. This is because cuboids, estimated rectangular regions over a video frame, are computationally efficient, has a compact representation and object centric. Such properties are already shown to add value to traditional video coding systems. Herein cuboidal feature descriptors are extracted from the current frame and then employed for accomplishing a machine…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
