Harnessing LLMs for Automated Video Content Analysis: An Exploratory Workflow of Short Videos on Depression
Jiaying Lizzy Liu, Yunlong Wang, Yao Lyu, Yiheng Su, Shuo Niu, Xuhai, Orson Xu, Yan Zhang

TL;DR
This study explores using Large Language Models for analyzing short videos on depression, developing a new multimodal workflow, and comparing LLM annotations with human coders to assess accuracy and limitations.
Contribution
It introduces a novel LLM-assisted multimodal workflow for video content analysis, including prompt engineering and human evaluation, specifically applied to depression-related videos.
Findings
LLMs show higher accuracy in object and activity annotations than emotion and genre.
The workflow enhances transparency through explanation prompts.
Potential and limitations of LLMs in video annotation are identified.
Abstract
Despite the growing interest in leveraging Large Language Models (LLMs) for content analysis, current studies have primarily focused on text-based content. In the present work, we explored the potential of LLMs in assisting video content analysis by conducting a case study that followed a new workflow of LLM-assisted multimodal content analysis. The workflow encompasses codebook design, prompt engineering, LLM processing, and human evaluation. We strategically crafted annotation prompts to get LLM Annotations in structured form and explanation prompts to generate LLM Explanations for a better understanding of LLM reasoning and transparency. To test LLM's video annotation capabilities, we analyzed 203 keyframes extracted from 25 YouTube short videos about depression. We compared the LLM Annotations with those of two human coders and found that LLM has higher accuracy in object and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
