GazeGen: Gaze-Driven User Interaction for Visual Content Generation
He-Yen Hsieh, Ziyun Li, Sai Qian Zhang, Wei-Te Mark Ting, Kao-Den, Chang, Barbara De Salvo, Chiao Liu, H. T. Kung

TL;DR
GazeGen is a novel system that combines real-time gaze estimation with visual content generation, enabling intuitive gaze-driven editing and creation of images and videos on small edge devices.
Contribution
It introduces DFT Gaze, a lightweight, accurate gaze prediction model derived through knowledge distillation and personalization, integrated into a system for gaze-controlled visual content manipulation.
Findings
DFT Gaze achieves low angular error and latency on edge devices.
GazeGen enables real-time gaze-driven image editing and video creation.
The system demonstrates high accuracy and responsiveness in various scenarios.
Abstract
We present GazeGen, a user interaction system that generates visual content (images and videos) for locations indicated by the user's eye gaze. GazeGen allows intuitive manipulation of visual content by targeting regions of interest with gaze. Using advanced techniques in object detection and generative AI, GazeGen performs gaze-controlled image adding/deleting, repositioning, and surface style changes of image objects, and converts static images into videos. Central to GazeGen is the DFT Gaze (Distilled and Fine-Tuned Gaze) agent, an ultra-lightweight model with only 281K parameters, performing accurate real-time gaze predictions tailored to individual users' eyes on small edge devices. GazeGen is the first system to combine visual content generation with real-time gaze estimation, made possible exclusively by DFT Gaze. This real-time gaze estimation enables various visual content…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaze Tracking and Assistive Technology · Tactile and Sensory Interactions · Virtual Reality Applications and Impacts
MethodsKnowledge Distillation
