Incorporating Eye-Tracking Signals Into Multimodal Deep Visual Models For Predicting User Aesthetic Experience In Residential Interiors
Chen-Ying Chien, Po-Chih Kuo

TL;DR
This paper presents a dual-branch CNN-LSTM model that combines visual features and eye-tracking data to predict aesthetic experiences in residential interiors, improving accuracy over existing methods.
Contribution
It introduces a novel multimodal deep learning framework that fuses eye-tracking signals with visual data for aesthetic evaluation, demonstrating the value of eye-tracking as privileged information.
Findings
Model achieves 72.2% accuracy on objective aesthetic dimensions.
Model attains 66.8% accuracy on subjective aesthetic dimensions.
Eye-tracking data enhances prediction performance, especially for subjective assessments.
Abstract
Understanding how people perceive and evaluate interior spaces is essential for designing environments that promote well-being. However, predicting aesthetic experiences remains difficult due to the subjective nature of perception and the complexity of visual responses. This study introduces a dual-branch CNN-LSTM framework that fuses visual features with eye-tracking signals to predict aesthetic evaluations of residential interiors. We collected a dataset of 224 interior design videos paired with synchronized gaze data from 28 participants who rated 15 aesthetic dimensions. The proposed model attains 72.2% accuracy on objective dimensions (e.g., light) and 66.8% on subjective dimensions (e.g., relaxation), outperforming state-of-the-art video baselines and showing clear gains on subjective evaluation tasks. Notably, models trained with eye-tracking retain comparable performance when…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Aesthetic Perception and Analysis · Gaze Tracking and Assistive Technology
