ReCorD: Reasoning and Correcting Diffusion for HOI Generation

Jian-Yu Jiang-Lin; Kang-Yang Huang; Ling Lo; Yi-Ning Huang; Terence; Lin; Jhih-Ciang Wu; Hong-Han Shuai; Wen-Huang Cheng

arXiv:2407.17911·cs.MM·July 26, 2024

ReCorD: Reasoning and Correcting Diffusion for HOI Generation

Jian-Yu Jiang-Lin, Kang-Yang Huang, Ling Lo, Yi-Ning Huang, Terence, Lin, Jhih-Ciang Wu, Hong-Han Shuai, Wen-Huang Cheng

PDF

1 Repo

TL;DR

ReCorD is a training-free method that enhances diffusion-based image generation by coupling latent diffusion with visual language models, improving the depiction of human-object interactions with higher fidelity and efficiency.

Contribution

It introduces a novel reasoning and correcting framework that refines HOI generation without additional training, combining interaction-aware reasoning and correction modules for better accuracy.

Findings

01

Outperforms existing methods in HOI classification score

02

Achieves higher FID and Verb CLIP-Score

03

Reduces computational requirements

Abstract

Diffusion models revolutionize image generation by leveraging natural language to guide the creation of multimedia content. Despite significant advancements in such generative models, challenges persist in depicting detailed human-object interactions, especially regarding pose and object placement accuracy. We introduce a training-free method named Reasoning and Correcting Diffusion (ReCorD) to address these challenges. Our model couples Latent Diffusion Models with Visual Language Models to refine the generation process, ensuring precise depictions of HOIs. We propose an interaction-aware reasoning module to improve the interpretation of the interaction, along with an interaction correcting module to refine the output image for more precise HOI generation delicately. Through a meticulous process of pose selection and object positioning, ReCorD achieves superior fidelity in generated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

j1anglin/ReCorD
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDiffusion