SEMBED: Semantic Embedding of Egocentric Action Videos

Michael Wray; Davide Moltisanti; Walterio Mayol-Cuevas; Dima Damen

arXiv:1607.08414·cs.CV·August 1, 2016

SEMBED: Semantic Embedding of Egocentric Action Videos

Michael Wray, Davide Moltisanti, Walterio Mayol-Cuevas, Dima Damen

PDF

Open Access

TL;DR

SEMBED is a novel method that embeds egocentric videos into a semantic-visual graph to better estimate their labels, capturing semantic relationships and visual similarities, and outperforming traditional classifiers.

Contribution

The paper introduces SEMBED, a new approach that combines semantic and visual information for labeling egocentric videos with ambiguous object interactions.

Findings

01

SEMBED outperforms SVM classification by over 5% on a challenging dataset.

02

It effectively captures semantic relationships and visual similarities.

03

The approach handles ambiguous and unbounded verb labels in egocentric videos.

Abstract

We present SEMBED, an approach for embedding an egocentric object interaction video in a semantic-visual graph to estimate the probability distribution over its potential semantic labels. When object interactions are annotated using unbounded choice of verbs, we embrace the wealth and ambiguity of these labels by capturing the semantic relationships as well as the visual similarities over motion and appearance features. We show how SEMBED can interpret a challenging dataset of 1225 freely annotated egocentric videos, outperforming SVM classification by more than 5%.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Analysis and Summarization · Multimodal Machine Learning Applications