Generative Timelines for Instructed Visual Assembly

Alejandro Pardo; Jui-Hsien Wang; Bernard Ghanem; Josef Sivic; and Bryan Russell; Fabian Caba Heilbron

arXiv:2411.12293·cs.CV·November 20, 2024

Generative Timelines for Instructed Visual Assembly

Alejandro Pardo, Jui-Hsien Wang, Bernard Ghanem, Josef Sivic, and Bryan Russell, Fabian Caba Heilbron

PDF

Open Access

TL;DR

This paper introduces the Timeline Assembler, a generative multimodal model that enables natural language guided editing of visual timelines, making complex visual assembly accessible and efficient.

Contribution

It presents a novel multimodal model, a dataset generation method, and demonstrates superior performance in visual timeline assembly tasks.

Findings

01

Outperforms baseline models including GPT-4o

02

Successfully handles complex visual assembly instructions

03

Creates new datasets for image and video assembly

Abstract

The objective of this work is to manipulate visual timelines (e.g. a video) through natural language instructions, making complex timeline editing tasks accessible to non-expert or potentially even disabled users. We call this task Instructed visual assembly. This task is challenging as it requires (i) identifying relevant visual content in the input timeline as well as retrieving relevant visual content in a given input (video) collection, (ii) understanding the input natural language instruction, and (iii) performing the desired edits of the input visual timeline to produce an output timeline. To address these challenges, we propose the Timeline Assembler, a generative model trained to perform instructed visual assembly tasks. The contributions of this work are three-fold. First, we develop a large multimodal language model, which is designed to process visual content, compactly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsManufacturing Process and Optimization · Human Motion and Animation · Augmented Reality Applications