A Semi-Automated Method for Object Segmentation in Infant's Egocentric Videos to Study Object Perception
Qazaleh Mirsharif, Sidharth Sadani, Shishir Shah, Hanako Yoshida,, Joseph Burling

TL;DR
This paper introduces a semi-automated, domain-specific object segmentation method for infant egocentric videos, enabling efficient annotation and analysis of object perception during early childhood development.
Contribution
The paper presents a novel semi-automated segmentation approach combining user input, graph cut, and optical flow tailored for challenging infant egocentric videos.
Findings
High speed and accuracy in segmenting voluminous videos
Effective handling of large head movements and changing object properties
Facilitates cognitive studies of object perception in infants
Abstract
Object segmentation in infant's egocentric videos is a fundamental step in studying how children perceive objects in early stages of development. From the computer vision perspective, object segmentation in such videos pose quite a few challenges because the child's view is unfocused, often with large head movements, effecting in sudden changes in the child's point of view which leads to frequent change in object properties such as size, shape and illumination. In this paper, we develop a semi-automated, domain specific, method to address these concerns and facilitate the object annotation process for cognitive scientists allowing them to select and monitor the object under segmentation. The method starts with an annotation from the user of the desired object and employs graph cut segmentation and optical flow computation to predict the object mask for subsequent video frames…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Visual Attention and Saliency Detection · Robotics and Sensor-Based Localization
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
