Multimodal Language Models for Domain-Specific Procedural Video   Summarization

Nafisa Hussain

arXiv:2407.05419·cs.CV·July 9, 2024

Multimodal Language Models for Domain-Specific Procedural Video Summarization

Nafisa Hussain

PDF

Open Access

TL;DR

This paper investigates fine-tuning multimodal large language models, specifically TimeChat, on domain-specific datasets to improve summarization and step-by-step instruction generation in long instructional videos within cooking and medical domains.

Contribution

It demonstrates the effectiveness of domain-specific fine-tuning of TimeChat for enhanced video summarization and instructional extraction in specialized fields.

Findings

01

Fine-tuning improves key step extraction accuracy.

02

Domain-specific datasets enhance summarization quality.

03

Models provide personalized, domain-relevant guidance.

Abstract

Videos serve as a powerful medium to convey ideas, tell stories, and provide detailed instructions, especially through long-format tutorials. Such tutorials are valuable for learning new skills at one's own pace, yet they can be overwhelming due to their length and dense content. Viewers often seek specific information, like precise measurements or step-by-step execution details, making it essential to extract and summarize key segments efficiently. An intelligent, time-sensitive video assistant capable of summarizing and detecting highlights in long videos is highly sought after. Recent advancements in Multimodal Large Language Models offer promising solutions to develop such an assistant. Our research explores the use of multimodal models to enhance video summarization and step-by-step instruction generation within specific domains. These models need to understand temporal events and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Computational and Text Analysis Methods