From Lifestyle Vlogs to Everyday Interactions
David F. Fouhey, Wei-cheng Kuo, Alexei A. Efros, Jitendra, Malik

TL;DR
This paper leverages Internet Lifestyle Vlogs to gather diverse interaction data for understanding human actions, revealing biases in traditional datasets and benchmarking tasks like object contact detection and hand future prediction.
Contribution
It introduces a novel reverse data collection approach from lifestyle vlogs, enabling larger, more diverse datasets and analysis of biases in existing data.
Findings
Collected large, diverse interaction data from vlogs
Identified biases in traditional explicit datasets
Benchmarked object contact and hand future prediction tasks
Abstract
A major stumbling block to progress in understanding basic human interactions, such as getting out of bed or opening a refrigerator, is lack of good training data. Most past efforts have gathered this data explicitly: starting with a laundry list of action labels, and then querying search engines for videos tagged with each label. In this work, we do the reverse and search implicitly: we start with a large collection of interaction-rich video data and then annotate and analyze it. We use Internet Lifestyle Vlogs as the source of surprisingly large and diverse interaction data. We show that by collecting the data first, we are able to achieve greater scale and far greater diversity in terms of actions and actors. Additionally, our data exposes biases built into common explicitly gathered data. We make sense of our data by analyzing the central component of interaction -- hands. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
