# Design of Realistic and Artistically Expressive 3D Facial Models for Film AIGC: A Cross-Modal Framework Integrating Audience Perception Evaluation

**Authors:** Yihuan Tian, Xinyang Li, Zuling Cheng, Yang Huang, Tao Yu

PMC · DOI: 10.3390/s25154646 · Sensors (Basel, Switzerland) · 2025-07-26

## TL;DR

This paper introduces a new framework for generating realistic and artistically expressive 3D facial models for film, using a cross-modal approach that improves lighting adaptation and artistic quality.

## Contribution

A cross-modal 3D face generation framework that integrates audience perception evaluation and physical rendering for improved realism and artistic quality.

## Key findings

- The method achieves an SSIM score of 0.892, a 37.6% improvement over the baseline.
- Generated faces score 8/10 in realism and 7/10 in aesthetics based on a perceptual evaluation with 1000 viewers.
- The framework uses semantic masks and physical rendering to decouple lighting and geometry for robust adaptation.

## Abstract

The rise of virtual production has created an urgent need for both efficient and high-fidelity 3D face generation schemes for cinema and immersive media, but existing methods are often limited by lighting–geometry coupling, multi-view dependency, and insufficient artistic quality. To address this, this study proposes a cross-modal 3D face generation framework based on single-view semantic masks. It utilizes Swin Transformer for multi-level feature extraction and combines with NeRF for illumination decoupled rendering. We utilize physical rendering equations to explicitly separate surface reflectance from ambient lighting to achieve robust adaptation to complex lighting variations. In addition, to address geometric errors across illumination scenes, we construct geometric a priori constraint networks by mapping 2D facial features to 3D parameter space as regular terms with the help of semantic masks. On the CelebAMask-HQ dataset, this method achieves a leading score of SSIM = 0.892 (37.6% improvement from baseline) with FID = 40.6. The generated faces excel in symmetry and detail fidelity with realism and aesthetic scores of 8/10 and 7/10, respectively, in a perceptual evaluation with 1000 viewers. By combining physical-level illumination decoupling with semantic geometry a priori, this paper establishes a quantifiable feedback mechanism between objective metrics and human aesthetic evaluation, providing a new paradigm for aesthetic quality assessment of AI-generated content.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12349015/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12349015/full.md

## References

33 references — full list in the complete paper: https://tomesphere.com/paper/PMC12349015/full.md

---
Source: https://tomesphere.com/paper/PMC12349015