Diffusion-based Visual Anagram as Multi-task Learning

Zhiyuan Xu; Yinhe Chen; Huan-ang Gao; Weiyan Zhao; Guiyu Zhang; Hao; Zhao

arXiv:2412.02693·cs.CV·December 4, 2024

Diffusion-based Visual Anagram as Multi-task Learning

Zhiyuan Xu, Yinhe Chen, Huan-ang Gao, Weiyan Zhao, Guiyu Zhang, Hao, Zhao

PDF

Open Access 1 Repo

TL;DR

This paper introduces a multi-task learning framework for diffusion-based visual anagram generation, addressing concept segregation and domination issues through novel optimization and noise balancing techniques.

Contribution

It proposes a new multi-task learning approach with anti-segregation and noise balancing strategies to improve visual anagram generation using diffusion models.

Findings

01

Enhanced visual anagram quality demonstrated by qualitative results.

02

Quantitative metrics show improved concept diversity and overlap.

03

Method outperforms existing diffusion-based approaches in generating complex anagrams.

Abstract

Visual anagrams are images that change appearance upon transformation, like flipping or rotation. With the advent of diffusion models, generating such optical illusions can be achieved by averaging noise across multiple views during the reverse denoising process. However, we observe two critical failure modes in this approach: (i) concept segregation, where concepts in different views are independently generated, which can not be considered a true anagram, and (ii) concept domination, where certain concepts overpower others. In this work, we cast the visual anagram generation problem in a multi-task learning setting, where different viewpoint prompts are analogous to different tasks,and derive denoising trajectories that align well across tasks simultaneously. At the core of our designed framework are two newly introduced techniques, where (i) an anti-segregation optimization strategy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pixtella/anagram-mtl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Computer Science and Engineering

MethodsDiffusion · ALIGN