Latent Space Imaging
Matheus Souza, Yidan Zheng, Kaizhang Kang, Yogeshwar Nath Mishra,, Qiang Fu, Wolfgang Heidrich

TL;DR
Latent Space Imaging proposes a new method that encodes visual information directly into a generative model's latent space, significantly reducing hardware complexity and bandwidth for imaging and downstream tasks, inspired by the human visual system.
Contribution
This work introduces a hardware prototype that encodes images into a generative model's latent space, achieving high compression ratios and demonstrating a novel integration of optics and software for efficient imaging.
Findings
Achieved compression ratios from 1:100 to 1:1000 during imaging.
Up to 1:16384 compression for downstream applications.
Demonstrated hardware feasibility with a single-pixel camera.
Abstract
Digital imaging systems have traditionally relied on brute-force measurement and processing of pixels arranged on regular grids. In contrast, the human visual system performs significant data reduction from the large number of photoreceptors to the optic nerve, effectively encoding visual information into a low-bandwidth latent space representation optimized for brain processing. Inspired by this, we propose a similar approach to advance artificial vision systems. Latent Space Imaging introduces a new paradigm that combines optics and software to encode image information directly into the semantically rich latent space of a generative model. This approach substantially reduces bandwidth and memory demands during image capture and enables a range of downstream tasks focused on the latent space. We validate this principle through an initial hardware prototype based on a single-pixel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Imaging Techniques and Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
