Gaud\'i: Conversational Interactions with Deep Representations to Generate Image Collections
Victor S. Bursztyn, Jennifer Healey, Vishwa Vinay

TL;DR
Gaudí leverages recent advances in language and cross-modal models to enable conversational image search for mood-board creation, transforming the process into an interactive dialogue that captures the storytelling aspect of design.
Contribution
This work introduces a novel conversational system that models mood-boards as storytelling narratives, integrating GPT-3 and CLIP for natural language-driven image search in design.
Findings
First to represent mood-boards as stories in AI-assisted design.
Enables natural language interaction for mood-board creation.
Improves alignment with designers' creative storytelling process.
Abstract
Based on recent advances in realistic language modeling (GPT-3) and cross-modal representations (CLIP), Gaud\'i was developed to help designers search for inspirational images using natural language. In the early stages of the design process, with the goal of eliciting a client's preferred creative direction, designers will typically create thematic collections of inspirational images called "mood-boards". Creating a mood-board involves sequential image searches which are currently performed using keywords or images. Gaud\'i transforms this process into a conversation where the user is gradually detailing the mood-board's theme. This representation allows our AI to generate new search queries from scratch, straight from a project briefing, following a theme hypothesized by GPT-3. Compared to previous computational approaches to mood-board creation, to the best of our knowledge, ours is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Design Education and Practice
MethodsLinear Layer · Byte Pair Encoding · Multi-Head Attention · Cosine Annealing · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Dropout · Attention Is All You Need · Softmax · Weight Decay · Adam
