SEMBED: Semantic Embedding of Egocentric Action Videos
Michael Wray, Davide Moltisanti, Walterio Mayol-Cuevas, Dima Damen

TL;DR
SEMBED is a novel method that embeds egocentric videos into a semantic-visual graph to better estimate their labels, capturing semantic relationships and visual similarities, and outperforming traditional classifiers.
Contribution
The paper introduces SEMBED, a new approach that combines semantic and visual information for labeling egocentric videos with ambiguous object interactions.
Findings
SEMBED outperforms SVM classification by over 5% on a challenging dataset.
It effectively captures semantic relationships and visual similarities.
The approach handles ambiguous and unbounded verb labels in egocentric videos.
Abstract
We present SEMBED, an approach for embedding an egocentric object interaction video in a semantic-visual graph to estimate the probability distribution over its potential semantic labels. When object interactions are annotated using unbounded choice of verbs, we embrace the wealth and ambiguity of these labels by capturing the semantic relationships as well as the visual similarities over motion and appearance features. We show how SEMBED can interpret a challenging dataset of 1225 freely annotated egocentric videos, outperforming SVM classification by more than 5%.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Analysis and Summarization · Multimodal Machine Learning Applications
