Team RUC_AIM3 Technical Report at ActivityNet 2021: Entities Object Localization
Ludan Ruan (1), Jieting Chen (1), Yuqing Song (1), Shizhe Chen (2),, Qin Jin (1) ((1) Renmin University of China, (2) INRIA)

TL;DR
This paper introduces a two-stage approach for Entities Object Localization, improving caption generation with a new pre-training model and enhancing object grounding with fine-tuning and post-processing, achieving state-of-the-art results.
Contribution
It proposes dividing the task into separate modules, with a novel multi-modal pre-training model and an improved grounding method, to boost overall system performance.
Findings
Achieved 72.57% localization accuracy on sub-task I
Attained 0.2477 F1 score on sub-task II
Outperformed previous methods in Entities Object Localization challenge
Abstract
Entities Object Localization (EOL) aims to evaluate how grounded or faithful a description is, which consists of caption generation and object grounding. Previous works tackle this problem by jointly training the two modules in a framework, which limits the complexity of each module. Therefore, in this work, we propose to divide these two modules into two stages and improve them respectively to boost the whole system performance. For the caption generation, we propose a Unified Multi-modal Pre-training Model (UMPM) to generate event descriptions with rich objects for better localization. For the object grounding, we fine-tune the state-of-the-art detection model MDETR and design a post processing method to make the grounding results more faithful. Our overall system achieves the state-of-the-art performances on both sub-tasks in Entities Object Localization challenge at Activitynet…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
MethodsMDETR
