TL;DR
This paper introduces a novel multi-modal, sequential AI creation framework inspired by human experiences, utilizing a multi-channel sequence-to-sequence model with attention and curriculum negative sampling, validated on a new dataset.
Contribution
It proposes a new multi-modal, sequential AI creation task, along with a multi-channel architecture, a curriculum negative sampling strategy, and a new dataset for benchmarking.
Findings
Significant improvements over baselines in automatic metrics
Effective modeling of multi-modal sequential information
Validated on a newly labeled multi-modal experience dataset
Abstract
AI creation, such as poem or lyrics generation, has attracted increasing attention from both industry and academic communities, with many promising models proposed in the past few years. Existing methods usually estimate the outputs based on single and independent visual or textual information. However, in reality, humans usually make creations according to their experiences, which may involve different modalities and be sequentially correlated. To model such human capabilities, in this paper, we define and solve a novel AI creation problem based on human experiences. More specifically, we study how to generate texts based on sequential multi-modal information. Compared with the previous works, this task is much more difficult because the designed model has to well understand and adapt the semantics among different modalities and effectively convert them into the output in a sequential…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
