NPF-200: A Multi-Modal Eye Fixation Dataset and Method for   Non-Photorealistic Videos

Ziyu Yang; Sucheng Ren; Zongwei Wu; Nanxuan Zhao; Junle Wang; Jing; Qin; Shengfeng He

arXiv:2308.12163·cs.CV·August 24, 2023

NPF-200: A Multi-Modal Eye Fixation Dataset and Method for Non-Photorealistic Videos

Ziyu Yang, Sucheng Ren, Zongwei Wu, Nanxuan Zhao, Junle Wang, Jing, Qin, Shengfeng He

PDF

Open Access 1 Repo

TL;DR

This paper introduces NPF-200, a large-scale multi-modal dataset of non-photorealistic videos with eye fixations, and proposes NPSNet, a frequency-aware multi-modal saliency detection model that advances understanding of human attention in such videos.

Contribution

The work provides the first large-scale multi-modal dataset for non-photorealistic videos and develops a novel frequency-aware saliency detection model, enhancing research in visual attention and media design.

Findings

01

NPF-200 dataset contains diverse, high-quality non-photorealistic videos with soundtracks.

02

NPSNet achieves state-of-the-art performance in multi-modal saliency detection.

03

Analysis reveals strengths and weaknesses of current multi-modal network designs.

Abstract

Non-photorealistic videos are in demand with the wave of the metaverse, but lack of sufficient research studies. This work aims to take a step forward to understand how humans perceive non-photorealistic videos with eye fixation (\ie, saliency detection), which is critical for enhancing media production, artistic design, and game user experience. To fill in the gap of missing a suitable dataset for this research line, we present NPF-200, the first large-scale multi-modal dataset of purely non-photorealistic videos with eye fixations. Our dataset has three characteristics: 1) it contains soundtracks that are essential according to vision and psychological studies; 2) it includes diverse semantic content and videos are of high-quality; 3) it has rich motions across and within videos. We conduct a series of analyses to gain deeper insights into this task and compare several…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yangziyu/npf200
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Image and Video Quality Assessment · Virtual Reality Applications and Impacts