Towards Kinetic Manipulation of the Latent Space
Diego Porres

TL;DR
This paper introduces Visual-reactive Interpolation, a novel method that uses live RGB camera feeds and CNN feature extraction to manipulate the latent space of generative models, offering a simple, hardware-efficient alternative to traditional GUI-based tools.
Contribution
It proposes a new paradigm for latent space manipulation using real-time scene changes and CNN features, bypassing complex interfaces and specialized hardware.
Findings
CNN features from live camera feeds effectively manipulate latent space
Simple scene changes lead to meaningful latent space interpolation
Code implementation is publicly available for reproducibility
Abstract
The latent space of many generative models are rich in unexplored valleys and mountains. The majority of tools used for exploring them are so far limited to Graphical User Interfaces (GUIs). While specialized hardware can be used for this task, we show that a simple feature extraction of pre-trained Convolutional Neural Networks (CNNs) from a live RGB camera feed does a very good job at manipulating the latent space with simple changes in the scene, with vast room for improvement. We name this new paradigm Visual-reactive Interpolation, and the full code can be found at https://github.com/PDillis/stylegan3-fun.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Topic Modeling · Natural Language Processing Techniques
