Enhancing Generative AI Image Refinement with Scribbles and Annotations: A Comparative Study of Multimodal Prompts
Hyerim Park, Phuong Thao Tran, Andre Luckow, Ceenu George, Michael Sedlmair, Malin Eiband

TL;DR
This study investigates how combining text, visual prompts, and annotations can improve the precision, efficiency, and user experience of generative AI image refinement, based on empirical user studies with designers.
Contribution
It provides an empirical comparison of multimodal prompts for GenAI refinement, introduces a prototype supporting scribbles and annotations, and offers insights into designers' multimodal strategies.
Findings
Visual prompts enhance spatial editing speed and clarity.
Text prompts are effective for semantic and global changes.
Combined multimodal prompts offer the best overall user experience.
Abstract
Generative AI (GenAI) image tools are increasingly used in design practice, enabling rapid ideation but offering limited support for refinement tasks such as adjusting layout, scale, or visual attributes. While text prompts and inpainting allow localized edits, they often remain inefficient or ambiguous for precise, in-context, and iterative refinement -- motivating the exploration of alternative methods. This work examines how pen-based scribbles and annotations can enhance GenAI image refinement. A formative study with seven professional designers informed a prototype supporting three input modalities: text-only, visual-only, and combined prompting. A within-subjects study with 30 designers and design students compared these modalities across closed- and open-ended tasks, evaluating expressiveness, efficiency, workload, user experience, iteration, and multimodal strategies. Visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Interactive and Immersive Displays · Design Education and Practice
