Towards Parameter-Efficient Integration of Pre-Trained Language Models   In Temporal Video Grounding

Erica K. Shimomoto; Edison Marrese-Taylor; Hiroya Takamura; Ichiro; Kobayashi; Hideki Nakayama; Yusuke Miyao

arXiv:2209.13359·cs.CV·May 26, 2023·1 cites

Towards Parameter-Efficient Integration of Pre-Trained Language Models In Temporal Video Grounding

Erica K. Shimomoto, Edison Marrese-Taylor, Hiroya Takamura, Ichiro, Kobayashi, Hideki Nakayama, Yusuke Miyao

PDF

Open Access 1 Repo

TL;DR

This paper investigates the integration of pre-trained language models into Temporal Video Grounding, demonstrating that NLP adapters enable efficient fine-tuning and improve model performance without altering visual inputs.

Contribution

It provides a systematic study of PLM effects in TVG and evaluates parameter-efficient adapters, highlighting their effectiveness and practical benefits.

Findings

01

PLMs significantly improve TVG performance without changing visual inputs.

02

NLP adapters can match state-of-the-art results with less computational cost.

03

Different adapters perform better in various scenarios.

Abstract

This paper explores the task of Temporal Video Grounding (TVG) where, given an untrimmed video and a natural language sentence query, the goal is to recognize and determine temporal boundaries of action instances in the video described by the query. Recent works tackled this task by improving query inputs with large pre-trained language models (PLM) at the cost of more expensive training. However, the effects of this integration are unclear, as these works also propose improvements in the visual inputs. Therefore, this paper studies the effects of PLMs in TVG and assesses the applicability of parameter-efficient training with NLP adapters. We couple popular PLMs with a selection of existing approaches and test different adapters to reduce the impact of the additional parameters. Our results on three challenging datasets show that, without changing the visual inputs, TVG models greatly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ericashimomoto/parameter-efficient-tvg
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning

MethodsTest