What Can You Learn from Your Muscles? Learning Visual Representation   from Human Interactions

Kiana Ehsani; Daniel Gordon; Thomas Nguyen; Roozbeh Mottaghi; Ali; Farhadi

arXiv:2010.08539·cs.CV·March 9, 2021·1 cites

What Can You Learn from Your Muscles? Learning Visual Representation from Human Interactions

Kiana Ehsani, Daniel Gordon, Thomas Nguyen, Roozbeh Mottaghi, Ali, Farhadi

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel visual representation learning method that incorporates human interaction and attention cues, outperforming traditional visual-only methods across multiple vision tasks.

Contribution

It presents a new dataset and approach using human interaction data to enhance visual representations, surpassing state-of-the-art visual-only methods.

Findings

01

Outperforms MoCo on various tasks

02

Uses human interaction cues for learning

03

Provides a new dataset for interaction-based learning

Abstract

Learning effective representations of visual data that generalize to a variety of downstream tasks has been a long quest for computer vision. Most representation learning approaches rely solely on visual data such as images or videos. In this paper, we explore a novel approach, where we use human interaction and attention cues to investigate whether we can learn better representations compared to visual-only representations. For this study, we collect a dataset of human interactions capturing body part movements and gaze in their daily lives. Our experiments show that our "muscly-supervised" representation that encodes interaction and attention cues outperforms a visual-only state-of-the-art method MoCo (He et al.,2020), on a variety of target tasks: scene classification (semantic), action recognition (temporal), depth estimation (geometric), dynamics prediction (physics) and walkable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ehsanik/muscleTorch
pytorchOfficial

Videos

What Can You Learn From Your Muscles? Learning Visual Representation from Human Interactions· slideslive

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsInfoNCE · Batch Normalization · Momentum Contrast