Exploring Temporal Representation in Neural Processes for Multimodal Action Prediction

Marco Gabriele Fedozzi; Yukie Nagai; Francesco Rea; Alessandra Sciutti

arXiv:2604.08418·cs.RO·April 10, 2026

Exploring Temporal Representation in Neural Processes for Multimodal Action Prediction

Marco Gabriele Fedozzi, Yukie Nagai, Francesco Rea, Alessandra Sciutti

PDF

TL;DR

This paper investigates the use of Conditional Neural Processes for multimodal action prediction in robotics, proposing a temporal encoding enhancement to improve generalization to unseen actions.

Contribution

It introduces DMBN-PTE, a revised model with improved temporal representation, advancing autonomous action forecasting in robotic systems.

Findings

01

DMBN can reconstruct visuo-motor signals during partial actions.

02

DMBN struggles to generalize to unseen actions due to temporal representation issues.

03

DMBN-PTE improves temporal robustness and prediction capabilities.

Abstract

Inspired by the human ability to understand and predict others, we study the applicability of Conditional Neural Processes (CNP) to the task of self-supervised multimodal action prediction in robotics. Following recent results regarding the ontogeny of the Mirror Neuron System (MNS), we focus on the preliminary objective of self-actions prediction. We find a good MNS-inspired model in the existing Deep Modality Blending Network (DMBN), able to reconstruct the visuo-motor sensory signal during a partially observed action sequence by leveraging the probabilistic generation of CNP. After a qualitative and quantitative evaluation, we highlight its difficulties in generalizing to unseen action sequences, and identify the cause in its inner representation of time. Therefore, we propose a revised version, termed DMBN-Positional Time Encoding (DMBN-PTE), that facilitates learning a more robust…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.