TalkMosaic: Interactive PhotoMosaic with Multi-modal LLM Q&A Interactions
Kevin Li, Fulu Li

TL;DR
TalkMosaic introduces an interactive photomosaic system combining multimodal LLMs and image interaction to enhance environmental awareness and provide detailed car-related information efficiently.
Contribution
The paper presents a novel multimodal GPT model, TalkMosaic, integrating car images and knowledge, with optimized inference techniques for interactive photo-mosaic applications.
Findings
Effective interactive photo-mosaic with click-and-display functionality.
Enhanced inference speed using sparse attention and quantization techniques.
Prototype demonstrates feasibility and practical utility.
Abstract
We use images of cars of a wide range of varieties to compose an image of an animal such as a bird or a lion for the theme of environmental protection to maximize the information about cars in a single composed image and to raise the awareness about environmental challenges. We present a novel way of image interaction with an artistically-composed photomosaic image, in which a simple operation of "click and display" is used to demonstrate the interactive switch between a tile image in a photomosaic image and the corresponding original car image, which will be automatically saved on the Desktop. We build a multimodal custom GPT named TalkMosaic by incorporating car images information and the related knowledge to ChatGPT. By uploading the original car image to TalkMosaic, we can ask questions about the given car image and get the corresponding answers efficiently and effectively such as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Wikis in Education and Collaboration
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Cosine Annealing · Multi-Head Attention · Weight Decay · Linear Warmup With Cosine Annealing · Adam · Residual Connection · Byte Pair Encoding
