MARMOT: Masked Autoencoder for Modeling Transient Imaging

Siyuan Shen; Ziheng Wang; Xingyue Peng; Suan Xia; Ruiqian Li; Shiying Li; Jingyi Yu

arXiv:2506.08470·cs.CV·June 11, 2025

MARMOT: Masked Autoencoder for Modeling Transient Imaging

Siyuan Shen, Ziheng Wang, Xingyue Peng, Suan Xia, Ruiqian Li, Shiying Li, Jingyi Yu

PDF

Open Access

TL;DR

MARMOT is a self-supervised Transformer-based autoencoder trained on large NLOS transient datasets, enabling improved modeling and reconstruction of hidden objects in transient imaging applications.

Contribution

The paper introduces MARMOT, a novel masked autoencoder that leverages self-supervised learning on large datasets for modeling transient imaging in NLOS scenarios.

Findings

01

MARMOT outperforms state-of-the-art methods in NLOS transient imaging.

02

Pretraining on large datasets enhances downstream task performance.

03

MARMOT effectively predicts full transient measurements from partial data.

Abstract

Pretrained models have demonstrated impressive success in many modalities such as language and vision. Recent works facilitate the pretraining paradigm in imaging research. Transients are a novel modality, which are captured for an object as photon counts versus arrival times using a precisely time-resolved sensor. In particular for non-line-of-sight (NLOS) scenarios, transients of hidden objects are measured beyond the sensor's direct line of sight. Using NLOS transients, the majority of previous works optimize volume density or surfaces to reconstruct the hidden objects and do not transfer priors learned from datasets. In this work, we present a masked autoencoder for modeling transient imaging, or MARMOT, to facilitate NLOS applications. Our MARMOT is a self-supervised model pretrianed on massive and diverse NLOS transient datasets. Using a Transformer-based encoder-decoder, MARMOT…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning