Action-conditioned Deep Visual Prediction with RoAM, a new Indoor Human Motion Dataset for Autonomous Robots
Meenakshi Sarkar, Vinayak Honkote, Dibyendu Das, Debasish Ghose

TL;DR
This paper introduces the RoAM dataset, a comprehensive indoor human motion dataset captured from a robot's perspective, and benchmarks a new action-conditioned visual prediction framework for autonomous robot navigation.
Contribution
The paper presents the RoAM dataset for indoor human motion prediction and introduces ACPNet, a novel framework for action-conditioned future frame prediction in mobile robotics.
Findings
ACPNet effectively predicts future frames conditioned on robot actions.
RoAM dataset enables benchmarking of visual prediction in dynamic indoor environments.
Incorporating robot dynamics improves prediction accuracy.
Abstract
With the increasing adoption of robots across industries, it is crucial to focus on developing advanced algorithms that enable robots to anticipate, comprehend, and plan their actions effectively in collaboration with humans. We introduce the Robot Autonomous Motion (RoAM) video dataset, which is collected with a custom-made turtlebot3 Burger robot in a variety of indoor environments recording various human motions from the robot's ego-vision. The dataset also includes synchronized records of the LiDAR scan and all control actions taken by the robot as it navigates around static and moving human agents. The unique dataset provides an opportunity to develop and benchmark new visual prediction frameworks that can predict future image frames based on the action taken by the recording agent in partially observable scenarios or cases where the imaging sensor is mounted on a moving platform.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Advanced Vision and Imaging
MethodsFocus
