Generating Event-oriented Attribution for Movies via Two-Stage Prefix-Enhanced Multimodal LLM
Yuanjie Lyu, Tong Xu, Zihan Niu, Bo Peng, Jing Ke, Enhong Chen

TL;DR
This paper introduces a novel two-stage prefix-enhanced multimodal large language model for movie event attribution, effectively connecting related events and their causal semantics across entire movies, surpassing existing methods.
Contribution
The proposed TSPE framework innovatively combines local and global stages with prefix-guided attention and a knowledge graph to improve event attribution in movies.
Findings
Outperforms state-of-the-art methods on real-world datasets.
Effectively links related events with causal semantics.
Enhances multimodal event understanding across entire movies.
Abstract
The prosperity of social media platforms has raised the urgent demand for semantic-rich services, e.g., event and storyline attribution. However, most existing research focuses on clip-level event understanding, primarily through basic captioning tasks, without analyzing the causes of events across an entire movie. This is a significant challenge, as even advanced multimodal large language models (MLLMs) struggle with extensive multimodal information due to limited context length. To address this issue, we propose a Two-Stage Prefix-Enhanced MLLM (TSPE) approach for event attribution, i.e., connecting associated events with their causal semantics, in movie videos. In the local stage, we introduce an interaction-aware prefix that guides the model to focus on the relevant multimodal information within a single clip, briefly summarizing the single event. Correspondingly, in the global…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies
MethodsFocus
