Analyzing Recursiveness in Multimodal Generative Artificial Intelligence: Stability or Divergence?
Javier Conde, Tobias Cheung, Gonzalo Mart\'inez, Pedro Reviriego, Rik, Sarkar

TL;DR
This study investigates whether recursive modality changes in multimodal AI tools like GPT-4o and DALL-E 3 lead to content divergence or stability, revealing divergence tendencies and influencing future content generation practices.
Contribution
It provides an empirical analysis of recursive modality transformations in multimodal AI, highlighting divergence behaviors and their dependence on initial content and model settings.
Findings
Recursive modality changes tend to diverge from the original content.
Divergence varies with initial image type and model configuration.
Multimodal loops do not converge to stable content, indicating potential limitations.
Abstract
One of the latest trends in generative Artificial Intelligence is tools that generate and analyze content in different modalities, such as text and images, and convert information from one to the other. From a conceptual point of view, it is interesting to study whether these modality changes incur information loss and to what extent. This is analogous to variants of the classical game telephone, where players alternate between describing images and creating drawings based on those descriptions leading to unexpected transformations of the original content. In the case of AI, modality changes can be applied recursively, starting from an image to extract a text that describes it; using the text to generate a second image, extracting a text that describes it, and so on. As this process is applied recursively, AI tools are generating content from one mode to use them to create content in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in Service Interactions · Language and cultural evolution · Speech and dialogue systems
