Compressing Human Body Video with Interactive Semantics: A Generative Approach

Bolin Chen; Shanzhi Yin; Hanwei Zhu; Lingyu Zhu; Zihan Zhang; Jie Chen; Ru-Ling Liao; Shiqi Wang; Yan Ye

arXiv:2505.16152·eess.IV·May 23, 2025

Compressing Human Body Video with Interactive Semantics: A Generative Approach

Bolin Chen, Shanzhi Yin, Hanwei Zhu, Lingyu Zhu, Zihan Zhang, Jie Chen, Ru-Ling Liao, Shiqi Wang, Yan Ye

PDF

Open Access

TL;DR

This paper introduces a novel generative approach for compressing human body videos that allows interactive control over semantics, achieving high-quality reconstruction at ultra-low bitrates and enabling future metaverse applications.

Contribution

The paper presents a new encoder-decoder framework using a 3D human model for semantic-based compression and interactivity, outperforming existing standards at low bitrates.

Findings

01

Achieves promising compression at ultra-low bitrates compared to VVC and recent generative methods.

02

Enables interactive manipulation of human body videos without extra pre/post-processing.

03

Supports high-quality reconstruction through semantic-driven mesh evolution.

Abstract

In this paper, we propose to compress human body video with interactive semantics, which can facilitate video coding to be interactive and controllable by manipulating semantic-level representations embedded in the coded bitstream. In particular, the proposed encoder employs a 3D human model to disentangle nonlinear dynamics and complex motion of human body signal into a series of configurable embeddings, which are controllably edited, compactly compressed, and efficiently transmitted. Moreover, the proposed decoder can evolve the mesh-based motion fields from these decoded semantics to realize the high-quality human body video reconstruction. Experimental results illustrate that the proposed framework can achieve promising compression performance for human body videos at ultra-low bitrate ranges compared with the state-of-the-art video coding standard Versatile Video Coding (VVC) and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Advanced Data Compression Techniques · Video Coding and Compression Technologies