Multimodal Storytelling via Generative Adversarial Imitation Learning
Zhiqian Chen, Xuchao Zhang, Arnold P. Boedihardjo, Jing Dai and, Chang-Tien Lu

TL;DR
This paper introduces MIL-GAN, a multimodal imitation learning approach using GANs to model user interests in storytelling, effectively capturing cross-modality information and outperforming existing methods in aligning with user preferences.
Contribution
It presents a novel multimodal imitation learning framework with GANs that directly models user interests from diverse data sources for storytelling.
Findings
Outperforms competing methods in user preference alignment
Successfully models cross-modality information in storytelling
Demonstrates effectiveness through a user study
Abstract
Deriving event storylines is an effective summarization method to succinctly organize extensive information, which can significantly alleviate the pain of information overload. The critical challenge is the lack of widely recognized definition of storyline metric. Prior studies have developed various approaches based on different assumptions about users' interests. These works can extract interesting patterns, but their assumptions do not guarantee that the derived patterns will match users' preference. On the other hand, their exclusiveness of single modality source misses cross-modality information. This paper proposes a method, multimodal imitation learning via generative adversarial networks(MIL-GAN), to directly model users' interests as reflected by various data. In particular, the proposed model addresses the critical challenge by imitating users' demonstrated storylines. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Topic Modeling · Music and Audio Processing
