TL;DR
This paper introduces DECADE, a large-scale dataset of dog-centric videos, to directly model dog behavior from visual data, enabling better understanding of animal actions and planning.
Contribution
It presents a novel approach to model a visually intelligent agent directly from visual input, using a new dataset and demonstrating generalization to other tasks.
Findings
Successful modeling of dog actions from visual data
Representation learned encodes distinct information from image classification models
Generalizes well to walkable surface estimation
Abstract
We introduce the task of directly modeling a visually intelligent agent. Computer vision typically focuses on solving various subtasks related to visual intelligence. We depart from this standard approach to computer vision; instead we directly model a visually intelligent agent. Our model takes visual information as input and directly predicts the actions of the agent. Toward this end we introduce DECADE, a large-scale dataset of ego-centric videos from a dog's perspective as well as her corresponding movements. Using this data we model how the dog acts and how the dog plans her movements. We show under a variety of metrics that given just visual input we can successfully model this intelligent agent in many situations. Moreover, the representation learned by our model encodes distinct information compared to representations trained on image classification, and our learned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
