Towards Kinetic Manipulation of the Latent Space

Diego Porres

arXiv:2409.09867·cs.CV·November 12, 2024

Towards Kinetic Manipulation of the Latent Space

Diego Porres

PDF

Open Access 1 Repo

TL;DR

This paper introduces Visual-reactive Interpolation, a novel method that uses live RGB camera feeds and CNN feature extraction to manipulate the latent space of generative models, offering a simple, hardware-efficient alternative to traditional GUI-based tools.

Contribution

It proposes a new paradigm for latent space manipulation using real-time scene changes and CNN features, bypassing complex interfaces and specialized hardware.

Findings

01

CNN features from live camera feeds effectively manipulate latent space

02

Simple scene changes lead to meaningful latent space interpolation

03

Code implementation is publicly available for reproducibility

Abstract

The latent space of many generative models are rich in unexplored valleys and mountains. The majority of tools used for exploring them are so far limited to Graphical User Interfaces (GUIs). While specialized hardware can be used for this task, we show that a simple feature extraction of pre-trained Convolutional Neural Networks (CNNs) from a live RGB camera feed does a very good job at manipulating the latent space with simple changes in the scene, with vast room for improvement. We name this new paradigm Visual-reactive Interpolation, and the full code can be found at https://github.com/PDillis/stylegan3-fun.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pdillis/stylegan3-fun
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Topic Modeling · Natural Language Processing Techniques