FED-Bench: A Cross-Granular Benchmark for Disentangled Evaluation of Facial Expression Editing
Fengjian Xue, Xuecheng Wu, Heli Sun, Yunyun Shi, Shi Chen, Liangyu Fu, Jinheng Xie, Dingkang Yang, Hao Wang, Junxiao Xue, Liang He

TL;DR
FED-Bench introduces a comprehensive, multi-dimensional benchmark and evaluation protocol for facial expression editing, addressing existing gaps in quality, instruction adherence, and bias mitigation.
Contribution
It provides a new scalable benchmark with a detailed evaluation suite and demonstrates its utility by improving model performance through additional training data.
Findings
Current models struggle with high-fidelity, accurate expression editing.
FED-Score effectively disentangles evaluation dimensions, reducing bias.
Fine-grained instruction following is the main bottleneck in current approaches.
Abstract
Facial expression image editing requires fine-grained control to strictly preserve human identity and background while precisely manipulating expression. However, existing editing benchmarks primarily focus on general scenarios, lacking high-quality facial images and corresponding editing instructions. Furthermore, current evaluation metrics exhibit systemic biases in this task, often favoring lazy editing or overfit editing. To bridge these gaps, we propose FED-Bench, a comprehensive benchmark featuring rigorous testing and an accurate evaluation suite. First, we carefully construct a benchmark of 747 triplets through a cascaded and scalable pipeline, each comprising an original image, an editing instruction, and a ground-truth image for precise evaluation. Second, we introduce FED-Score, a cross-granularity evaluation protocol that disentangles assessment into three dimensions:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
