Learning Visually Interpretable Oscillator Networks for Soft Continuum Robots from Video
Henrik Krauss, Johann Licher, Naoya Takeishi, Annika Raatz, Takehisa Yairi

TL;DR
This paper introduces a novel, interpretable deep learning framework for modeling soft continuum robot dynamics from video, combining attention mechanisms and latent oscillators for accurate, visual, and mechanical interpretability.
Contribution
The paper presents ABCD and VONs, new modules enabling visual interpretability and mechanical understanding in data-driven soft robot models, surpassing prior methods in accuracy and interpretability.
Findings
ABCD improves multi-step prediction accuracy by 5.8x for Koopman operators.
VONs discover a chain structure of oscillators autonomously.
Models achieve 3.5x error reduction on a two-segment robot.
Abstract
Learning soft continuum robot (SCR) dynamics from video offers flexibility but existing methods lack interpretability or rely on prior assumptions. Model-based approaches require prior knowledge and manual design. We bridge this gap by introducing: (1) The Attention Broadcast Decoder (ABCD), a plug-and-play module for autoencoder-based latent dynamics learning that generates pixel-accurate attention maps localizing each latent dimension's contribution while filtering static backgrounds, enabling visual interpretability via spatially grounded latents and on-image overlays. (2) Visual Oscillator Networks (VONs), a 2D latent oscillator network coupled to ABCD attention maps for on-image visualization of learned masses, coupling stiffness, and forces, enabling mechanical interpretability. We validate our approach on single- and double-segment SCRs, demonstrating that ABCD-based models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
