Latent-INR: A Flexible Framework for Implicit Representations of Videos with Discriminative Semantics
Shishira R Maiya, Anubhav Gupta, Matthew Gwilliam, Max Ehrlich,, Abhinav Shrivastava

TL;DR
Latent-INR introduces a flexible video implicit neural representation framework that maintains compression efficiency while embedding semantic properties into latents, enabling retrieval, interpolation, and open-ended interaction.
Contribution
The paper proposes a novel decoupled INR framework with learned latents aligned to vision models, integrating compression with semantic understanding and downstream tasks.
Findings
Effective video compression with semantic latents
Latents aligned with CLIP enable retrieval tasks
Latents support video interpolation and superresolution
Abstract
Implicit Neural Networks (INRs) have emerged as powerful representations to encode all forms of data, including images, videos, audios, and scenes. With video, many INRs for video have been proposed for the compression task, and recent methods feature significant improvements with respect to encoding time, storage, and reconstruction quality. However, these encoded representations lack semantic meaning, so they cannot be used for any downstream tasks that require such properties, such as retrieval. This can act as a barrier for adoption of video INRs over traditional codecs as they do not offer any significant edge apart from compression. To alleviate this, we propose a flexible framework that decouples the spatial and temporal aspects of the video INR. We accomplish this with a dictionary of per-frame latents that are learned jointly with a set of video specific hypernetworks, such…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis
MethodsSparse Evolutionary Training · Contrastive Language-Image Pre-training · ALIGN
