TL;DR
This paper introduces a keypoint-based encoding method for video streaming that significantly reduces bandwidth and latency by transmitting body and face keypoints for real-time digital puppetry, especially useful in poor network conditions.
Contribution
The paper presents a novel keypoint-centric encoder and decoder for video streaming that achieves lower bandwidth usage and latency compared to traditional codecs, enabling real-time digital puppetry.
Findings
Bandwidth requirement is below 35 kbps, an order of magnitude lower than typical systems.
Computational latency for mesh extraction and animation is under 120ms on a standard laptop.
Prototype demonstrates effective real-time video communication with semantic preservation.
Abstract
COVID-19 has made video communication one of the most important modes of information exchange. While extensive research has been conducted on the optimization of the video streaming pipeline, in particular the development of novel video codecs, further improvement in the video quality and latency is required, especially under poor network conditions. This paper proposes an alternative to the conventional codec through the implementation of a keypoint-centric encoder relying on the transmission of keypoint information from within a video feed. The decoder uses the streamed keypoints to generate a reconstruction preserving the semantic features in the input feed. Focusing on video calling applications, we detect and transmit the body pose and face mesh information through the network, which are displayed at the receiver in the form of animated puppets. Using efficient pose and face mesh…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
