Vision-State Fusion: Improving Deep Neural Networks for Autonomous Robotics
Elia Cereda, Stefano Bonato, Mirko Nava, Alessandro Giusti, and, Daniele Palossi

TL;DR
This paper introduces a novel vision-state fusion approach for deep neural networks that enhances non-egocentric 3D pose estimation in robotics, leading to improved accuracy and real-world UAV performance.
Contribution
It is the first to apply state fusion to non-egocentric tasks, significantly improving regression accuracy with minimal computational overhead.
Findings
Improved R² metric up to +0.51 across diverse tasks.
Achieved 24% reduction in mean absolute error in UAV human pose estimation.
Validated real-world UAV performance with enhanced accuracy.
Abstract
Vision-based deep learning perception fulfills a paramount role in robotics, facilitating solutions to many challenging scenarios, such as acrobatic maneuvers of autonomous unmanned aerial vehicles (UAVs) and robot-assisted high-precision surgery. Control-oriented end-to-end perception approaches, which directly output control variables for the robot, commonly take advantage of the robot's state estimation as an auxiliary input. When intermediate outputs are estimated and fed to a lower-level controller, i.e. mediated approaches, the robot's state is commonly used as an input only for egocentric tasks, which estimate physical properties of the robot itself. In this work, we propose to apply a similar approach for the first time -- to the best of our knowledge -- to non-egocentric mediated tasks, where the estimated outputs refer to an external subject. We prove how our general…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
