Latent-INR: A Flexible Framework for Implicit Representations of Videos   with Discriminative Semantics

Shishira R Maiya; Anubhav Gupta; Matthew Gwilliam; Max Ehrlich,; Abhinav Shrivastava

arXiv:2408.02672·cs.CV·August 6, 2024

Latent-INR: A Flexible Framework for Implicit Representations of Videos with Discriminative Semantics

Shishira R Maiya, Anubhav Gupta, Matthew Gwilliam, Max Ehrlich,, Abhinav Shrivastava

PDF

Open Access

TL;DR

Latent-INR introduces a flexible video implicit neural representation framework that maintains compression efficiency while embedding semantic properties into latents, enabling retrieval, interpolation, and open-ended interaction.

Contribution

The paper proposes a novel decoupled INR framework with learned latents aligned to vision models, integrating compression with semantic understanding and downstream tasks.

Findings

01

Effective video compression with semantic latents

02

Latents aligned with CLIP enable retrieval tasks

03

Latents support video interpolation and superresolution

Abstract

Implicit Neural Networks (INRs) have emerged as powerful representations to encode all forms of data, including images, videos, audios, and scenes. With video, many INRs for video have been proposed for the compression task, and recent methods feature significant improvements with respect to encoding time, storage, and reconstruction quality. However, these encoded representations lack semantic meaning, so they cannot be used for any downstream tasks that require such properties, such as retrieval. This can act as a barrier for adoption of video INRs over traditional codecs as they do not offer any significant edge apart from compression. To alleviate this, we propose a flexible framework that decouples the spatial and temporal aspects of the video INR. We accomplish this with a dictionary of per-frame latents that are learned jointly with a set of video specific hypernetworks, such…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis

MethodsSparse Evolutionary Training · Contrastive Language-Image Pre-training · ALIGN