TL;DR
This paper introduces SR-Clustering, a method that combines semantic and contextual information with temporal coherence to effectively segment egocentric photo streams, improving over existing techniques.
Contribution
The paper proposes a novel semantic regularized clustering approach that integrates CNN-based features and language processing for egocentric photo stream segmentation.
Findings
Outperforms state-of-the-art segmentation methods
Effective in organizing large egocentric image collections
Enhances subsequent activity and event analysis
Abstract
While wearable cameras are becoming increasingly popular, locating relevant information in large unstructured collections of egocentric images is still a tedious and time consuming processes. This paper addresses the problem of organizing egocentric photo streams acquired by a wearable camera into semantically meaningful segments. First, contextual and semantic information is extracted for each image by employing a Convolutional Neural Networks approach. Later, by integrating language processing, a vocabulary of concepts is defined in a semantic space. Finally, by exploiting the temporal coherence in photo streams, images which share contextual and semantic attributes are grouped together. The resulting temporal segmentation is particularly suited for further analysis, ranging from activity and event recognition to semantic indexing and summarization. Experiments over egocentric sets of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
