Narrative Action Evaluation with Prompt-Guided Multimodal Interaction
Shiyi Zhang, Sule Bai, Guangyi Chen, Lei Chen, Jiwen Lu, Junle Wang,, Yansong Tang

TL;DR
This paper introduces a novel narrative action evaluation task that generates detailed, objective natural language commentary on actions, using a prompt-guided multimodal interaction framework to improve performance over traditional multi-task learning methods.
Contribution
It proposes a prompt-guided multimodal interaction framework for narrative action evaluation, transforming score regression into video-text matching and re-annotating datasets for benchmarking.
Findings
Outperforms separate and naive multi-task learning methods
Re-annotated datasets with high-quality action narration
Established benchmarks for narrative action evaluation
Abstract
In this paper, we investigate a new problem called narrative action evaluation (NAE). NAE aims to generate professional commentary that evaluates the execution of an action. Unlike traditional tasks such as score-based action quality assessment and video captioning involving superficial sentences, NAE focuses on creating detailed narratives in natural language. These narratives provide intricate descriptions of actions along with objective evaluations. NAE is a more challenging task because it requires both narrative flexibility and evaluation rigor. One existing possible solution is to use multi-task learning, where narrative language and evaluative information are predicted separately. However, this approach results in reduced performance for individual tasks because of variations between tasks and differences in modality between language information and evaluation information. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage, Metaphor, and Cognition · Speech and dialogue systems
