Improving Personalized Image Generation through Social Context Feedback
Parul Gupta, Abhinav Dhall, Thanh-Toan Do

TL;DR
This paper introduces a feedback-based fine-tuning approach for personalized image generation that leverages multiple detectors to improve pose accuracy, identity preservation, and gaze consistency, enhancing realism and scene coherence.
Contribution
It proposes a novel method combining multiple feedback modules with diffusion models to address key limitations in personalized image generation.
Findings
Improved interaction accuracy in generated images.
Enhanced preservation of human identities.
Better gaze and pose consistency.
Abstract
Personalized image generation, where reference images of one or more subjects are used to generate their image according to a scene description, has gathered significant interest in the community. However, such generated images suffer from three major limitations -- complex activities, such as man, pushing, motorcycle are not generated properly with incorrect human poses, reference human identities are not preserved, and generated human gaze patterns are unnatural/inconsistent with the scene description. In this work, we propose to overcome these shortcomings through feedback-based fine-tuning of existing personalized generation methods, wherein, state-of-art detectors of pose, human-object-interaction, human facial recognition and human gaze-point estimation are used to refine the diffusion model. We also propose timestep-based inculcation of different feedback modules, depending…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques
