Summarizing First-Person Videos from Third Persons' Points of Views
Hsuan-I Ho, Wei-Chen Chiu, Yu-Chiang Frank Wang

TL;DR
This paper introduces a semi-supervised deep neural network model that effectively summarizes first-person videos by leveraging annotated third-person videos and limited first-person data, addressing the challenge of viewpoint differences.
Contribution
A novel deep learning architecture designed for first-person video summarization using semi-supervised learning with mixed viewpoint data.
Findings
Effective summarization of first-person videos demonstrated
Model generalizes well across different viewpoints
Qualitative and quantitative results show improved performance
Abstract
Video highlight or summarization is among interesting topics in computer vision, which benefits a variety of applications like viewing, searching, or storage. However, most existing studies rely on training data of third-person videos, which cannot easily generalize to highlight the first-person ones. With the goal of deriving an effective model to summarize first-person videos, we propose a novel deep neural network architecture for describing and discriminating vital spatiotemporal information across videos with different points of view. Our proposed model is realized in a semi-supervised setting, in which fully annotated third-person videos, unlabeled first-person videos, and a small number of annotated first-person ones are presented during training. In our experiments, qualitative and quantitative evaluations on both benchmarks and our collected first-person video datasets are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Human Pose and Action Recognition · Advanced Image and Video Retrieval Techniques
