Forecasting Hands and Objects in Future Frames
Chenyou Fan, Jangwon Lee, Michael S. Ryoo

TL;DR
This paper introduces a novel CNN-based approach to predict the future presence and location of hands and objects in videos, leveraging scene representations and regression techniques for improved forecasting accuracy.
Contribution
It proposes a new two-stream CNN architecture and a fully convolutional regression network for future scene representation prediction, advancing the state-of-the-art in future object forecasting.
Findings
Achieves higher accuracy than previous methods on a public dataset.
Effectively predicts future object presence and locations in video frames.
Combines scene representation regression with detection for reliable forecasting.
Abstract
This paper presents an approach to forecast future presence and location of human hands and objects. Given an image frame, the goal is to predict what objects will appear in the future frame (e.g., 5 seconds later) and where they will be located at, even when they are not visible in the current frame. The key idea is that (1) an intermediate representation of a convolutional object recognition model abstracts scene information in its frame and that (2) we can predict (i.e., regress) such representations corresponding to the future frames based on that of the current frame. We design a new two-stream convolutional neural network (CNN) architecture for videos by extending the state-of-the-art convolutional object detection network, and present a new fully convolutional regression network for predicting future scene representations. Our experiments confirm that combining the regressed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Human Pose and Action Recognition
