Exploring Motion and Appearance Information for Temporal Sentence Grounding
Daizong Liu, Xiaoye Qu, Pan Zhou, Yang Liu

TL;DR
This paper introduces MARN, a novel network that combines motion-aware and appearance-aware features to improve temporal sentence grounding by better modeling object relations across frames.
Contribution
The paper proposes a dual-branch network that separately encodes and integrates motion and appearance features for enhanced temporal grounding accuracy.
Findings
MARN significantly outperforms previous methods on Charades-STA and TACoS datasets.
Incorporating motion information improves the discrimination of subtle frame differences.
Separate modeling of motion and appearance leads to more accurate object relation reasoning.
Abstract
This paper addresses temporal sentence grounding. Previous works typically solve this task by learning frame-level video features and align them with the textual information. A major limitation of these works is that they fail to distinguish ambiguous video frames with subtle appearance differences due to frame-level feature extraction. Recently, a few methods adopt Faster R-CNN to extract detailed object features in each frame to differentiate the fine-grained appearance similarities. However, the object-level features extracted by Faster R-CNN suffer from missing motion analysis since the object detection model lacks temporal modeling. To solve this issue, we propose a novel Motion-Appearance Reasoning Network (MARN), which incorporates both motion-aware and appearance-aware object features to better reason object relations for modeling the activity among successive frames.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Video Analysis and Summarization
MethodsSoftmax · RoIPool · Convolution · Region Proposal Network · Faster R-CNN
