Dual-Path Temporal Map Optimization for Make-up Temporal Video Grounding

Jiaxiu Li; Kun Li; Jia Li; Guoliang Chen; Dan Guo; Meng Wang

arXiv:2309.06176·cs.CV·September 13, 2023

Dual-Path Temporal Map Optimization for Make-up Temporal Video Grounding

Jiaxiu Li, Kun Li, Jia Li, Guoliang Chen, Dan Guo, Meng Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces DPTMO, a dual-path network that improves fine-grained make-up video grounding by capturing detailed semantic cues through query-agnostic and query-guided features, leading to more accurate localization.

Contribution

The paper proposes a novel dual-path proposal-based framework that effectively captures fine-grained semantic details in make-up videos, surpassing existing methods in accuracy.

Findings

01

DPTMO outperforms previous methods on the YouMakeup dataset.

02

Dual-path structure enhances semantic comprehension of make-up activities.

03

Joint optimization of two proposal sets improves timestamp prediction accuracy.

Abstract

Make-up temporal video grounding (MTVG) aims to localize the target video segment which is semantically related to a sentence describing a make-up activity, given a long video. Compared with the general video grounding task, MTVG focuses on meticulous actions and changes on the face. The make-up instruction step, usually involving detailed differences in products and facial areas, is more fine-grained than general activities (e.g, cooking activity and furniture assembly). Thus, existing general approaches cannot locate the target activity effectually. More specifically, existing proposal generation modules are not yet fully developed in providing semantic cues for the more fine-grained make-up semantic comprehension. To tackle this issue, we propose an effective proposal-based framework named Dual-Path Temporal Map Optimization Network (DPTMO) to capture fine-grained multimodal semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lijiaxiuHFUT/DPTMO
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Subtitles and Audiovisual Media · Human Pose and Action Recognition