MiniGPT-Reverse-Designing: Predicting Image Adjustments Utilizing MiniGPT-4
Vahid Azizi, Fatemeh Koochaki

TL;DR
This paper extends MiniGPT-4 to perform reverse designing, enabling it to predict image edits and parameters from source and edited images along with optional textual descriptions, demonstrating the model's adaptability to complex vision-language tasks.
Contribution
The paper introduces a fine-tuning approach for MiniGPT-4 to handle reverse designing tasks, showcasing the model's ability to understand and predict image modifications based on multimodal inputs.
Findings
MiniGPT-4 can be adapted for complex vision-language tasks.
The model successfully predicts image edits and parameters.
Code for the approach is publicly available.
Abstract
Vision-Language Models (VLMs) have recently seen significant advancements through integrating with Large Language Models (LLMs). The VLMs, which process image and text modalities simultaneously, have demonstrated the ability to learn and understand the interaction between images and texts across various multi-modal tasks. Reverse designing, which could be defined as a complex vision-language task, aims to predict the edits and their parameters, given a source image, an edited version, and an optional high-level textual edit description. This task requires VLMs to comprehend the interplay between the source image, the edited version, and the optional textual context simultaneously, going beyond traditional vision-language tasks. In this paper, we extend and fine-tune MiniGPT-4 for the reverse designing task. Our experiments demonstrate the extensibility of off-the-shelf VLMs,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Image Segmentation Techniques
