Comparing Learning Paradigms for Egocentric Video Summarization
Daniel Wen

TL;DR
This paper compares different learning paradigms for egocentric video summarization, revealing that prompt fine-tuning of GPT-4o outperforms specialized models, but highlights the need for further advancements in first-person video understanding.
Contribution
It provides a comparative analysis of supervised, unsupervised, and prompt fine-tuning approaches for egocentric video summarization, demonstrating the potential of prompt-based models in this domain.
Findings
Prompt fine-tuned GPT-4o outperforms specialized models.
State-of-the-art models perform less effectively on first-person videos.
Evaluation conducted on a small subset of egocentric videos.
Abstract
In this study, we investigate various computer vision paradigms - supervised learning, unsupervised learning, and prompt fine-tuning - by assessing their ability to understand and interpret egocentric video data. Specifically, we examine Shotluck Holmes (state-of-the-art supervised learning), TAC-SUM (state-of-the-art unsupervised learning), and GPT-4o (a prompt fine-tuned pre-trained model), evaluating their effectiveness in video summarization. Our results demonstrate that current state-of-the-art models perform less effectively on first-person videos compared to third-person videos, highlighting the need for further advancements in the egocentric video domain. Notably, a prompt fine-tuned general-purpose GPT-4o model outperforms these specialized models, emphasizing the limitations of existing approaches in adapting to the unique challenges of first-person perspectives. Although our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis · Video Analysis and Summarization
